How Teads Built Their Data Pipeline with Amazon Redshift

Teads 

Teads is a video advertising marketplace, often ranked as the number 1 video platform in the world. Working with data-heavy videos must be supported by a powerful data infrastructure, but that’s not the end of the story. Teads’ business needs to log user interactions with their videos through the browser (like play, pause, resume, complete…), which count up to 10 million events per day. Another source of data is video auctions (real-time bidding processes) which generate another 60 million events per day. To build their complex data infrastructure, Teads has turned to both Google and Amazon for help.

Originally the data stack at Teads was based on a lambda architecture, using Storm, Spark and Cassandra. This architecture couldn’t scale well, so the company turned toward Google’s BigQuery in 2016. They already had their Kafka clusters on AWS, which was also running some of their ad delivery components, so the company chose a multi-cloud infrastructure. Transferring data between different cloud providers can get expensive and slow. To address the second part of this issue, Teads placed their AWS and GCP clouds as close as possible and connected them with managed VPNs.

So how does their complex multi-cloud data stack look like? Well, first of all, data coming from users’ browsers and data coming from ad auctions is enqueued in Kafka topics in AWS. Then using an inter-cloud link, data is passed over to GCP’s Dataflow which is then well paired with BigQuery in the next step. Having all data in a single warehouse means half of the work is done. The next step would be to deliver data to consumers, and Analytics is one of them. The Analytics service at Teads is a Scala-based app that queries data from the warehouse and stores it to tailored data marts. Interestingly, the data marts are actually AWS Redshift servers. In the final step, data is presented into intra-company dashboards, and the user’s web apps.

Abstract view of Teads’ data flow
Fig: An abstract view of Teads’ data flow, from ingestion to .  

Teads’ analytics part of the data stack
Fig: Teads’ analytics part of the data stack. 
Source: https://medium.com/teads-engineering

Download the Data Pipeline Resource Bundle

You'll get additional resources like:

  • Full stack breakdown and tech checklist
  • Summary slides with links to resources
  • PDF version of the blog post
Mike Pavloski

Mike Pavloski

Join 11,000 of your peers.
Subscribe to our newsletter SF Data.
People at Facebook, Amazon and Uber read it every week.

Every Monday morning we'll send you a roundup of the best content from intermix.io and around the web. Make sure you're ready for the week! See all issues.