How Cloudflare Built Their Data Pipeline with Amazon Redshift

Cloudflare 

Cloudflare is a web performance and security company that provides online services to protect and accelerate websites online. Online content distribution, web optimization, web security, and analytics are a few examples of the company’s business range.

While different services may require different data stacks to work on, they are all built on top of Cloudflare’s core infrastructure. In the core of their data stack there are Kafka clusters as a streaming platform, and CitusDB as a data warehouse – a scaled up version of PostgreSQL. Data is ingested through Cloudflare’s edge services using HTTP requests, then passed on to Kafka clusters, before getting stored in CitusDB warehouse. A nice example of a service working on top of this infrastructure is the DNS Analysis – a service which processes around 1 million DNS queries per second! The DNS edge service pre-processes and aggregates data, before sending it encrypted to one of Cloudflare’s data centers. Within the data center, data is de-multiplexed and pushed into several Apache Kafka clusters, which in turn pushes data to consumers grouped by Kafka topic. Consumers can store processed information into corresponding DBs which are later queried by the company’s API services and information delivered to customers.

Cloudflare gives their services to millions of websites around the world, processing and storing hundreds of terabytes of data daily. Interestingly, Cloudflare is not a fan of commercial cloud technologies, but they implement their own data centers across the world, in total 152 as of this moment.

Cloudflare’s DNS Analytics system
Fig: Cloudflare’s DNS Analytics system.
High-level view of CloudFlare’s data architecture

Fig: High-level view of CloudFlare’s data architecture.

Sources:

Download the Data Pipeline Resource Bundle

You'll get additional resources like:

  • Full stack breakdown and tech checklist
  • Summary slides with links to resources
  • PDF version of the blog post
Mike Pavloski

Mike Pavloski

Join 11,000 of your peers.
Subscribe to our newsletter SF Data.
People at Facebook, Amazon and Uber read it every week.

Every Monday morning we'll send you a roundup of the best content from intermix.io and around the web. Make sure you're ready for the week! See all issues.