How 500px Built Their Data Pipeline with Amazon Redshift


The whole data architecture at 500px is mainly based on two tools: Redshift – for data storage, and Periscope – for analytics, reporting, and visualization. From a customer-facing side, the company’s web and mobile apps run on top of a few API servers, backed by several databases – mostly MySQL. Data in these DBs is then processed through a Luigi ETL, before storing it to S3 and Redshift. Splunk here does a great job in querying and summarizing text-based logs. Periscope Data is responsible for building data insights and sharing them across different teams in the company. All in all, this infrastructure supports around 60 people distributed across a couple of teams within the company, as of 2015.

Data-related technologies used in 500px.
Fig: Some of the data-related technologies used in 500px. 

Download the Data Pipeline Resource Bundle

You'll get additional resources like:

  • Full stack breakdown and tech checklist
  • Summary slides with links to resources
  • PDF version of the blog post
Mike Pavloski

Mike Pavloski

Join 11,000 of your peers.
Subscribe to our newsletter SF Data.
People at Facebook, Amazon and Uber read it every week.

Every Monday morning we'll send you a roundup of the best content from and around the web. Make sure you're ready for the week! See all issues.