How Dollar Shave Club Built Their Data Pipeline with Amazon Redshift
Dollar Shave Club
Dollar Shave Club (DSC) is a lifestyle brand and e-commerce company that’s revolutionizing the bathroom by inventing smart, affordable products. Don’t be fooled by their name, they have a pretty cool data architecture, for a company in the shaving business. Their business model works with online sales through a subscription service. Currently, they serve around 3 million subscribed customers.
DSC’s web applications, internal services, and data infrastructure are 100% hosted on AWS. A Redshift cluster serves as the central data warehouse, receiving data from various systems. Data movement is facilitated with Apache Kafka and can move in different directions – from production DBs into the warehouse, in between different apps and in between internal pipeline components. There’s also Snowplow which collects data from the web and mobile clients. Once data reaches Redshift, it is accessed through various analytics platforms for monitoring, visualization, and insights. The main tool for the job is, of course, Apache Spark, which is mainly used to build predictive models, such as recommender systems for future sales.
Download the Data Pipeline Resource Bundle
You'll get additional resources like:
- Full stack breakdown and tech checklist
- Summary slides with links to resources
- PDF version of the blog post
Join 11,000 of your peers.
Subscribe to our newsletter SF Data.
People at Facebook, Amazon and Uber read it every week.
Every Monday morning we'll send you a roundup of the best content from intermix.io and around the web. Make sure you're ready for the week! See all issues.