The rise of data products and what that means for intermix.io
Existing users should read this blog post on specific changes and frequently asked questions.
Table of Contents
We helped you manage performance and scale your cluster
A few years ago, performance was the most important thing. Everybody faced the same struggle: fixing slow dashboards, tuning the Redshift WLM, finding tables that needed vacuuming, and generally feel comfortable that when the cluster reboots because it runs out of disk space – you at least know why.
A lot of time went to managing and scaling a cluster, trying to keep the lights on, while query volume and overall amount of stored data mounted. Using intermix.io – along with our slack channel and expert advice – made the pesky task of managing a cluster much easier. We gave you cluster recommendations, intuitive WLM charts and growth rates for tables, along with the right vacuum scripts.
We helped you solve those problems with a single, opinionated dashboard. Every customer got the same dashboard, with the same charts, showing you the same metrics. And in a world where everybody has the same challenges, it makes sense to have a product that works the same way for everybody.
The needs of data teams have evolved
Data teams were small when we first started. The primary uses cases were basic reporting and pretty dashboards. There were two primary roles:
- data engineers developed bespoke data integration code to move raw data into the data warehouse, and
- analysts would query that raw data to build reports and dashboards.
But there have been lots of innovation in analytics and data engineering, and as a result, the work and expectations for data teams have changed. Teams have shifted from a reactive “question / answer” analytics world to being part of operations, embedded in product teams, and behave as partners in the product development process.
Three key changes are happening:
1. Data Products
The data warehouse has emerged as the nerve-center for data. The actual data inside of the warehouse now feeds into digital assets, customer-facing products, and other applications that serve internal users.
Data is becoming more and more mission-critical, and real-world example of data products include personalization of product experiences, pricing algorithms for marketplaces, supply chain management for WiFi routers or daily settlement of commodity exchange contracts.
The result of all this is that data has become a product. We’ve started compiling stories from our customers on how they build data products at their own companies.
2. Specialization of roles with different problems and needs
Data integration used to be tedious, manual integration work. Today, it’s largely automated by tools like Fivetran, Matillion and Stitch. These tools have turned data integration into a matter of a few clicks. It has never been easier to get data in its raw form into a warehouse.
At the same time, visualization and exploration tools like Metabase, Periscope, and Looker introduced easy self-service access to data and gave rise to the ‘citizen analyst’. These tools made it easy to create and share insights – work that would otherwise require a data engineer.
But easy access to data created a host of new issues – duplicate work, diverging metrics, and an explosion of queries.
Data teams realized they needed to more tightly control the data and provide a single version of the ‘truth’ to their organization. In response, they started using tools like Airflow, dbt and Matillion to establish a ‘semantic’ layer within the data warehouse – an abstraction layer that gets built by modeling and running the most complex transformations in SQL on top of raw data.
Data in this new modeling layer is curated and prepared. It is of high quality and purpose-built. It’s cleaned, well-defined, transformed, tested, and documented. Because of the high quality of this data and the associated documentation, business users are able to bring the tools of their choice to access data, while getting reliable, consistent results.
In the beginning, a small team of data engineering heroes were responsible for everything. And that team would report into a function like IT, Engineering or Finance, with a small, tactical part of that function’s budget allocated to “data”.
That too has changed. Data is now among the top 3 priorities at the company level, with executive ownership and a formal budget. Titles vary based on the size and type of the company, for example “Chief Data Officer”, “VP of Analytics” or “Director of Data”.
That executive is the link between the business and more specialized teams and their leaders. Each team focuses on a specific layer of the whole data stack. We broadly see three disciplines:
- Data Scientist / Analysts: Work with business users to understand data requirements, drill down into data for deep insights, builds dashboards
- Analytics Engineers: Provide clean data for analytics by downstream users, build tests, maintain data documentation & definitions, train users on how to use the data platforms
- Data Engineers: Build the data platform, manage overall pipeline orchestration, optimize warehouse cost and performance
3. “Zero admin” Data warehouses with uncapped consumption
Vendors have abstracted the operation of the warehouse away from the infrastructure. Traditional DBA tasks like capacity management and provisioning have already been automated. In the case of Amazon Redshift, there are new features like AutoWLM, AutoVacuum and Elastic Resize that reduce the time spent on fine-tuning a cluster.
The world of data warehouses is now serverless and the notion of “managing a cluster” has gone away. Snowflake and BigQuery pioneered the concept, and even Oracle calls their warehouse “autonomous”. Redshift now even offers features like Spectrum and Concurrency Scaling that don’t require any provisioning of nodes anymore. The new RA3 node type supports separation of storage and compute.
In fact, customers are telling us that they don’t want to spend any time on managing their Redshift cluster at all. They want to run a “zero admin” warehouse that scales elastically and doesn’t require maintenance, so they can re-allocate their time to higher-value work.
However, serverless architectures lead to new problems around cost management and resource planning.
Analytics for Data Products
As with any customer-facing product – more effort must go into how that product is developed, what it costs to deliver, and how it’s being used. When we asked out customers “why do you log into intermix.io every day?”, the following use-cases emerged:
- User engagement. Analytics engineers spend much of their time preparing and delivering tables. Understanding how those tables convert and retain users is critical. Intermix.io unmasks the end-users and dashboards as well, so you can get a granular understanding of who is using your data.
- Resource planning. More data, users and models make it hard to scale your data warehouse efficiently. Teams need to know which tables and models are stale or not being used anymore, so they can cut waste and complexity.
- Cost. Uncapped consumption leads to unpredictable costs, and cost for compute ran rack up fast. Teams want to have a detailed understanding which users and models drive on-demand query costs.
- Performance. With number and complexity of models only going up, slow and expensive queries lead to stuck models, stale data, and frustrated users. Teams need to track dozens of query metrics to model & troubleshoot how data pipelines and BI dashboards are performing.
- Security. The data warehouse is now the nerve center of data for your company. It’s important to understand who and what is touching that data.
- Multiple Sources. Many users had different data warehouse purpose-built for different use-cases and teams. So they wanted an easy way to see everything in a single pane of glass.
More and more tools, users and models increase the complexity scaling and delivering data products in a cost-effective way. Analytics teams need to understand how users engage with the data they prepare, so they can manage their customers’ experience.
We built the new product from the ground-up to solve these use-cases.
Check out this article on specific details of the new product and what to expect.
Join 11,000 of your peers.
Subscribe to our newsletter SF Data.
People at Facebook, Amazon and Uber read it every week.
Every Monday morning we'll send you a roundup of the best content from intermix.io and around the web. Make sure you're ready for the week! See all issues.