4 Real World Amazon Redshift Use Cases
Table of Contents
At intermix.io, we spend all day helping our customers optimize their performance on Amazon Redshift. We have a front-row view of all the ways that Redshift can be used to help businesses manage their data. Redshift is a versatile product that can help businesses aggregate, store, analyze, and share their data. We’ve collected 4 Amazon Redshift use cases that help businesses get more out of their data
Redshift started out as a simpler, cheaper, and faster alternative to legacy on-premise warehouses. Fast forward to now, and the use cases are way more sophisticated than just running a data warehouse in the cloud.
In this article, you will find 4 examples of Amazon Redshift use cases:
- Traditional Data Warehousing
- Log Analysis
- Business Applications
- Mission-critical Workloads
Collect Data through Traditional Data Warehouses
Data warehouse technology has been around since Kimball and Inmon. What changed with Amazon Redshift was the price at which you can get it – about 20x less than what you had to carve out for legacy vendors like Oracle and Teradata.
The use case for data warehousing is to unify disparate data sources in a single place and run custom analytics for your business.
Let’s say you‘re the head of business intelligence for a web property that also has a mobile app. The typical categories of data sources are:
- Core production database with all customer data (“who are my customers?”)
- Event data from your website and your mobile app (“how are they behaving when they use our products?”)
- Data from your SaaS systems that you need to support your business (ads, payments, support, etc.) (“How did I acquire those customers, how much are they paying me, and what support costs do they cause me?”)
With a rich ecosystem of data integration vendors, it’s easy to build pipelines to those sources and feed data into Redshift. Put a powerful BI / dashboard tool on top, and you have a full-blown BI stack.
A key advantage of Redshift is simplicity. It used to take months to get a data warehouse up and running. And you’d need the help of an Accenture or IBM. None of that anymore. You can spin up a Redshift cluster in less than 15 minutes, and build a whole business intelligence stack in a weekend.
Take the case of NTT Docomo, for instance. The company built a scalable, secure data warehouse on Amazon Redshift.
The combination of price / speed / simplicity have expanded the addressable market for data warehousing from large corporations to SMBs. However, because it is so easy to get going, data engineers must make sure to follow the best practices when setting up their cluster and avoid any performance issues. As the volume of data increases, it can increase the complexity of data pipelines. Any data solution should be easily scalable.
Store & Process Data with Log Analysis
Previous generations of data warehouses had to aggregate data it was too expensive to store raw data. That changed with Amazon Redshift. Since Redshift is cheap, it’s possible to store raw, event-level data without exorbitant storage costs.
There are three key benefits to using event-level data:
- You get to keep the maximum amount of fidelity – no information gets lost in aggregation.
- Your level of analytical insight goes up. Geared with granular information, you can “slice ’n dice” your data in any possible way along any dimension.
- You can run historic replays of data, see what happened in the build-up to a specific event you’re tracking, and build “what-if” type of scenarios by changing the parameters of your algorithms / models.
Since Redshift is fast and cheap, processing machine data data is cost-effective. You can also drive the time required for “ingest-to-insight” (i.e. the time between pushing data into Redshift and the final analysis output) below the 5 minute mark. And not just for basic aggregations, but complex 10-way joins, across billions (billions with a “b”) of rows. That’s remarkable computational power.
Storing and processing raw data is one of the best Amazon Redshift use cases for businesses that deal with machine-generated data. This is high-velocity data, such as data from weblogs and clickstreams.
The business value of processing raw data at high speeds comes with exposing that data back into the business for enabling better business insights and new data-driven services. Here are two examples of companies that leveraged Amazon Redshift for real-time streaming analytics:
- iflix uses Redshift-based architecture for real-time analytics
- Betterment uses Redshift for advanced real-time analytics
The business value here goes beyond mere cost savings by migrating your warehouse to the cloud. Rather, you’re enabling new services, informed by data. These “data-driven services” are the foundation for better / faster decision making. They can also be new revenue-generating products.
That distinction is key. Previously, companies would look at a data warehouse as a “cost center”. So, the goal was to keep that cost down as much as possible by limiting the exposure of data to a limited set of people, among other tactics.
With the advent of the cloud and Amazon Redshift, it makes sense to increase spend on your data infrastructure. An increase in spending on data analysis can lead to large increments in revenues.
That brings us to the next Amazon Redshift use case:
Quickly Analyze Data for Business Applications
Not all companies have the technical abilities and budget to build and run a custom streaming pipeline with near real-time analytics.
But analytical use cases can be pretty similar across a single industry or vertical. That has given rise to “analytics-as-a-service” vendors. They use Redshift under the covers, to offer analytics in a SaaS model to their customers.
These vendors either run a single cluster in a multi-tenant model, or offer a single cluster to customers in a premium model. Take Acquia Lift as an example. The pricing model is a subscription fee to the analytics service.
To give you a rough estimate of how much an analytics service can cost in a SaaS model: in a multi-tenant model, you can cram data from 10s of customers onto a single node cluster, which costs you ~$200 / month. The average price of analytics services is about $500 / month / subscriber. Thus, for about $700 per month you get the ability to quickly analyze important data and unlock business insights.
Time-Sensitive Data Reporting for Mission-Critical Workloads
One of the common use cases for Amazon Redshift is to use it for mission-critical workloads. Here, data sitting in Redshift feeds into time-sensitive apps. It’s key that the database stays up; otherwise, the business goes down (quite literally).
A common example of time-sensitive data reporting is stock exchanges, such as the London Stock Exchange. There is daily reporting involved and the reporting can’t be late or wrong
Other use cases include building predictive models on top of Redshift, and then embed the results programmatically into another app, via a data API. An example is automated ad-bidding, where bids across certain ad networks are adjusted on a near real-time basis. The adjustments are calculated on ROI and performance of ad types over a certain time period.
Redshift has driven down the cost of running a data warehouse. It has also opened up opportunities for small and medium enterprises to deliver data-driven services and create new revenue streams
However, Redshift can be a difficult beast to tame. To get the maximum out of your data, it is important that your cluster is running efficiently at all times. If you’re part of a data team that’s building mission-critical data pipelines, sign-up for a free trial of intermix.io to improve your Redshift performance.
Join 11,000 of your peers.
Subscribe to our newsletter SF Data.
People at Facebook, Amazon and Uber read it every week.
Every Monday morning we'll send you a roundup of the best content from intermix.io and around the web. Make sure you're ready for the week! See all issues.