Start Now Login

Two years ago we launched a product to manage performance for data warehouses. Today we launched a new product, a new website, and a new mission to help data teams build better data products.

Existing users should read this blog post on specific changes and frequently asked questions.

Table of Contents

We helped you manage performance and scale your cluster

A few years ago, performance was the most important thing. Everybody faced the same struggle:  fixing slow dashboards, tuning the Redshift WLM, finding tables that needed vacuuming, and generally feel comfortable that when the cluster reboots because it runs out of disk space – you at least know why. 

A lot of time went to managing and scaling a cluster, trying to keep the lights on, while query volume and overall amount of stored data mounted. Using intermix.io – along with our slack channel and expert advice – made the pesky task of managing a cluster much easier. We gave you cluster recommendations, intuitive WLM charts and growth rates for tables, along with the right vacuum scripts. 

We helped you solve those problems with a single, opinionated dashboard.  Every customer got the same dashboard, with the same charts, showing you the same metrics. And in a world where everybody has the same challenges, it makes sense to have a product that works the same way for everybody.

The needs of data teams have evolved

Data teams were small when we first started. The primary uses cases were basic reporting and pretty dashboards.  There were two primary roles:

But there have been lots of innovation in analytics and data engineering, and as a result, the work and expectations for data teams have changed. Teams have shifted from a reactive “question / answer” analytics world to being part of operations, embedded in product teams, and behave as partners in the product development process. 

Three key changes are happening:

1. Data Products

The data warehouse has emerged as the nerve-center for data. The actual data inside of the warehouse now feeds into digital assets, customer-facing products, and other applications that serve internal users. 

Data is becoming more and more mission-critical, and real-world example of data products include personalization of product experiences, pricing algorithms for marketplaces, supply chain management for WiFi routers or daily settlement of commodity exchange contracts. 

The result of all this is that data has become a product.  We’ve started compiling stories from our customers on how they build data products at their own companies.

2. Specialization of roles with different problems and needs

Data integration used to be tedious, manual integration work. Today, it’s largely automated by tools like Fivetran, Matillion and Stitch. These tools have turned data integration into a matter of a few clicks. It has never been easier to get data in its raw form into a warehouse.

At the same time, visualization and exploration tools like Metabase, Periscope, and Looker introduced easy self-service access to data and gave rise to the ‘citizen analyst’. These tools made it easy to create and share insights – work that would otherwise require a data engineer.

But easy access to data created a host of new issues – duplicate work, diverging metrics, and an explosion of queries.

Data teams realized they needed to more tightly control the data and provide a single version of the ‘truth’ to their organization. In response, they started using tools like Airflow, dbt and Matillion to establish a ‘semantic’ layer within the data warehouse – an abstraction layer that gets built by modeling and running the most complex transformations in SQL on top of raw data.

Data in this new modeling layer is curated and prepared. It is of high quality and purpose-built. It’s cleaned, well-defined, transformed, tested, and documented. Because of the high quality of this data and the associated documentation, business users are able to bring the tools of their choice to access data, while getting reliable, consistent results.

In the beginning, a small team of data engineering heroes were responsible for everything. And that team would report into a function like IT, Engineering or Finance, with a small, tactical part of that function’s budget allocated to “data”.

That too has changed. Data is now among the top 3 priorities at the company level, with executive ownership and a formal budget. Titles vary based on the size and type of the company, for example “Chief Data Officer”, “VP of Analytics” or “Director of Data”.

That executive is the link between the business and more specialized teams and their leaders. Each team focuses on a specific layer of the whole data stack. We broadly see three disciplines:

3. “Zero admin” Data warehouses with uncapped consumption

Vendors have abstracted the operation of the warehouse away from the infrastructure. Traditional DBA tasks like capacity management and provisioning have already been automated. In the case of Amazon Redshift, there are new features like AutoWLM, AutoVacuum and Elastic Resize that reduce the time spent on fine-tuning a cluster. 

The world of data warehouses is now serverless and the notion of “managing a cluster” has gone away. Snowflake and BigQuery pioneered the concept, and even Oracle calls their warehouse “autonomous”. Redshift now even offers features like Spectrum and Concurrency Scaling that don’t require any provisioning of nodes anymore. The new RA3 node type supports separation of storage and compute. 

In fact, customers are telling us that they don’t want to spend any time on managing their Redshift cluster at all. They want to run a “zero admin” warehouse that scales elastically and doesn’t require maintenance, so they can re-allocate their time to higher-value work. 

However, serverless architectures lead to new problems around cost management and resource planning.

Analytics for Data Products

As with any customer-facing product – more effort must go into how that product is developed, what it costs to deliver, and how it’s being used. When we asked out customers “why do you log into intermix.io every day?”, the following use-cases emerged:

More and more tools, users and models increase the complexity scaling and delivering data products in a cost-effective way. Analytics teams need to understand how users engage with the data they prepare, so they can manage their customers’ experience.

We built the new product from the ground-up to solve these use-cases. 

Check out this article on specific details of the new product and what to expect.

Custom reports, table dependencies, cost analytics, and more. Start using the new intermix.io today.

Start Using It Now

New Look and Feel

Insert into events table dashboard

Polished, clean and responsive interface of the app makes it easy to find exactly what you’re looking for.

Custom Reports

reports dashboard

The product can be used to create custom dashboards and reports across a wide variety of use cases. This means that each member of your team can build a report for exactly what they care about, and save that report to come back to it later.

The existing charts in the intermix.io product can be re-implemented as custom reports in the new intermix.io.

Analyze Costs and Resource Planning

AWS Spectrum Cost

Drill-down into on-demand spectrum costs, and reduce data storage by finding tables that are never used (stale tables). See which users and workloads are contributing to spend.

Table Dependencies

product behaviour

Surfaces the tables and columns your queries touch so you can see dependencies between users, models and tables. Find the most active, and unused models and tables. Understand data access patterns of users and jobs. See the table dependencies for users and workflows.

Raw Data Export

Choose to download a CSV of the raw data for any report. Manipulate the data locally to integrate into presentations and communicate internally with your stakeholders.

Multiple sources

data sources

Easily add your dev & test warehouses and view them as new ‘sources’ when defining Jobs. We never need access to data you’ve copied into your data warehouse.

Start Using It Now

Frequently Asked Questions

How can I access the new product.

Go to the following URL:

https://app.intermix.io

Your existing password works, and you can login with either 1) your email address or 2) username.

How long can I use the existing product?

You will be able to use the existing product until January 15th, 2020. 

I love the current charts, will I be able to see similar visualizations in the new products?

Yes. And for current customers, we’ll recreate the charts from the existing itnermix.io product as custom reports in the new product.

How can I get support?

There are three ways to get support.  

Will the pricing change?

Yes. We’re switching pricing from a capacity-based model ($ / vCPU / month) to a utilization model based on “SQL Groups” ($ / # of SQL groups). Please see our pricing page for more details.

Did you cut some features?

Yes. There are a few areas of the product that we’ve decided to deprecate. This is because Amazon Redshift has already, or plans to automate many of the functionality around administering the cluster.

Will I need to reinstall my cluster?

No. Data from your existing data warehouse will be available in the new product experience. Simply select it as a ‘Source’ in the Jobs page, when defining a new Job.

Does my team know?

Yes. We have sent everyone an email, and also added a banner into the current product with a link to the new product.

“Real-time Fraud Detection”

This is part of a series of interviews on how companies are building data products. In these interviews, we’re sharing how data teams use data, with a deep dive into a data product at the company. We also cover tech stacks, best practices and other lessons learned.

About

Aaron Biller is a lead data engineer at Postmates. Postmates is an on-demand delivery platform with operations in 3,500 cities in the US and Mexico. With over 5 Million deliveries each month, Postmates is transforming the way food and merchandise is moved around cities.

A successful, on-time delivery is the single most important event for Postmates’ business.

“In the very, very early days, we would query our production database to understand how many deliveries we had for the past day. Reporting on our metrics would happen with spreadsheets. Clearly that wouldn’t scale, and so we shifted our analytics to a data warehouse and used Amazon Redshift” says Aaron Biller, Engineering Lead for Data at Postmates.

What business problem does this data product solve? 

“Data is ubiquitous at Postmates. It’s much more than reporting – we’re delivering data as a product and support data microservices”, says Biller.

Consider fraud prevention. On-demand platforms like Postmates have a unique exposure to payments and fraud because they have to assess risk in real-time. 

While the warehouse does not operate in real-time, it ingests and transforms event data from all transactions for downstream consumption by predictive models and real-time services.

The Postmates risk team has engineered an internal risk detection microservice called “Pegasus”. Event data passes through a series of transformations in Redshift and feeds into “business rules”, which take the transformed data as input and produce decisions as output, with live decisions for every individual transaction on the Postmates platform.

In addition to Fraud, the data team has built an infrastructure that drives four major use cases for data:

What is the tech-stack used?

As things have progressed at Postmates, they added more developers, more microservices and more data sources. “The amount of data we have, period, and the amount of new data we generate every day has expanded exponentially,” describes Biller the growth at Postmates.

Consider the amount of data collected during “peak delivery time” on Sunday nights, when people order their dinner to eat at home.

“Three years ago, we captured data from a certain number of ongoing deliveries on a Sunday night at peak. We’re now at about 30x the number of deliveries in flight. And we’re also monitoring and tracking so many more events per delivery. In short, we’re doing 30x the deliveries, and a single delivery includes 10x the data, and it just keeps growing,” explains Biller.

Amazon Redshift and Google BigQuery are the primary data warehouses.  

What are the sources of data?

The vast majority of raw data comes from the Postmates app itself. In addition to the app, the data team has built integrations with 3rd party services. Examples include:

You can write a query that combines 13 data sources in one single query and just run it and get data. That’s extraordinarily useful and powerful from an analytics and reporting perspective.”

“Used by over 300 people”

This is part of a series of interviews on how companies are building data products. In these interviews, we’re sharing how data teams use data, with a deep dive into a data product at the company. We also cover tech stacks, best practices and other lessons learned.

About

Stephen Bronstein leads the Data Team at Fuze. It’s a “skinny team” of 3 people that supports all of Fuze’s data needs.

Over the course of the past three years, Stephen has led the team through warehouse transitions and performance tuning, adoption of new data sources, regular surges of new data and use cases, and the on-boarding of hundreds of new data-users. “People care a lot about having the right data at the right time. It’s crucial to drive their work forward,” says Bronstein. 

Who is the end-user of this data product?

Fuze is a cloud-based communications and collaboration platform provider. The Fuze platform unifies voice, video, messaging, and conferencing services on a single, award-winning cloud platform,and delivers intelligent, mobile-ready apps to customers.

As Fuze has grown its customer base and employee count, data has become a mission-critical component. More than 300 people query the data warehouse on a constant basis across Fuze’s 19 global office locations.  

Departments include Finance, Sales, Product, and Customer Support.

What business problem does this data product solve? 

Each day critical business functions query the data warehouse: 

What is the tech-stack used?

A central Amazon Redshift data warehouse combines data from a growing number of sources and events. The toolchain around Amazon Redshift includes 

Data modeling is bespoke, and Fuze runs large-scale data transformations within Redshift in SQL. 

Watch the full video and download a complete transcript of our conversation.

Summary

Arvind Ramesh is the manager of the data team at Envoy. In this post, we’re sharing how Arvind’s team has built a data platform at Envoy, with a deep dive into a data product for the Customer Success Team. Arvind believes that the most important skill in data is storytelling and that data teams should operate more like software engineering teams. You can find Arvind on LinkedIn

For more detail, you can also watch the full video of our conversation. 

Data at Envoy: Background & Business Context

Envoy helps companies manage their office visitors and deliveries. Chances are you’ve already used Envoy somewhere in a lobby or a reception desk, where an iPad kiosk runs the Envoy app. The app checks you in, prints your name tag and sends your host a message that you’ve arrived. Guests can also use the “Envoy Passport” app for automatic check-in, collecting different “stamps” for each office visit, similar to visa stamps at customs when entering a country. Hosts can manage visitors and other things like mail and packages via a mobile app. 

envoy app

The apps generate a new, growing amount of data on a per-minute basis. Envoy has welcomed over 60M visitors, adding 130,000 new visitors every day, in over 13,000 locations around the globe. Envoy’s business model consists of a subscription where pricing scales with the number of locations and deliveries. Providing analysis and reporting for decision making across that location and visitor footprint is of course valuable. 

But basic reporting is not a data product. “The way I look at it, a data product, it’s not really delivering an insight per se, but it’s delivering ‘a thing’ that can be used by other people within or outside of the company to improve their workflow’, says Arvind.

envoy iphone app

At Envoy, data products are about enabling people to do more with the data Envoy generates. “The advantages of data products is that they enable other people to more easily access data and use it in their workflow without us having to go back and forth for every piece of insight”, says Arvind.

Part of Envoy’s secret to success is using a data product to drive account penetration and adoption of the Envoy app. 

Who is the end-user of this data product?

The Customer Success team uses Gainsight, a suite of SaaS products for monitoring and managing product adoption. “Gainsight is basically a UI that allows you to better manage your customer base, see how people are using your product, where there are problems, where you might need to focus your attention. But, like most tools, it is only valuable if it has a stream of comprehensive, reliable, and accurate customer data underpinning it.”

gainsight

Gainsight offers a full customer data platform, but in Envoy’s case the Gainsight app is a “skinny” UI, which sits on top of the Envoy data platform. In this specific case, it’s a table called “Gainsight company facts”. A daily batch process combines data from many different sources into the final table.

“The way I think of it Gainsight, it’s the UI that sits on top of our data platform because it would not be the best use of our time or anyone’s time to build a UI for that kind of thing. The way we think of data products – it’s a backend data service or data infrastructure. The actual last mile is a web dashboard or something similar. Usually we’ll use a tool to accomplish that.”

What Business Problem Does this Data Product Solve?

Data delivered in Gainsight helps the Customer Success team prioritize the accounts where product adoption is below par, and locations are at risk of churning. 

Compared to the raw source data that’s scattered across various sources and places, the new “Gainsight company facts” table has reliable, useful information in a single place, such as:

“We have a few customer success managers and each of them will have several hundred accounts” says Arvind. “They can have a conversation with the customer, ‘ok, you have 50 offices, but look, these five are actually having some issues and maybe you should focus on these’.” Arvind’s team helps make those conversations more effective with product usage data.

What Are the Data Sources & Tech Stack?

The data for the “Gainsight company facts” table is the result of a daily batch process. The process cleans, transforms and combines raw data from different sources within the Envoy data warehouse.

The image below shows a summary of the graph, or “DAG” involved in building the model. For every company, for every day, the output table contains a variety of key data points. 

To run the batch process, Arvind’s team has built a platform that consists of five key components.

The model for Gaingisht takes about 40 minutes to run, including completing all upstream dependencies for the freshest possible data. 

For other data products, the SLA can be a bit more real-time, and models run every three hours or even shorter. But “typically it’s pretty rare for people to truly need data real-time or even close to real time, unless it’s something operational or you’re looking at server metrics.”

Best practices and lessons learned

Next to customer success, the platform supports other data products, and building the underlying models requires domain knowledge. 

“That’s typically where we do a lot of work on our end. You really have to understand how our business works, how our activation metrics are defined, or how our products interact with each other”, says Arvind. Usually one person is an expert for a specific domain, but with peer review, documentation and a QA rotation, domain knowledge starts to make its way across the team. 

“Our team is eight people and that’s split across data engineering, product analytics and go-to-market analytics. We’re about 170 people right now total at Envoy, which translates to about  5% of the company working in data-focused roles. “If we build effective data products and continue to iterate on operational efficiencies, our team should not have to scale linearly with the company.”  

Building the platform has been an iterative process. “It’s probably a bit cliché, but I think data teams should work more like software engineering teams”. 

First was testing. “After building out the first set of several dozen models, we quickly realized that ensuring everything stayed accurate was a challenge. People were finding issues in the data before we were, which was embarrassing for us since we felt accountable for the accuracy of anything we put out. The upstream data was at times incorrect, and could change in ways we hadn’t anticipated. So we implemented tests, which are basically data sanity checks running on a schedule that checked the intermediate and final transformations in our pipeline. But then, when some of these tests failed, it’s not evident who on the team was responsible. They went unfixed for a long time, as each member of the team thought someone else would fix the issue”. 

So next up was QA rotation. “Now, every week, there’s someone else who’s responsible for fixing any data test issues that pop up, and the ‘oncall’ schedule is managed through PagerDuty”. As the company and the data team keeps growing, the person who is responsible for fixing a failed test may not understand the full logic that’s going on. 

That meant better documentation on how things are built and how to debug tests. “And again, I keep drawing this parallel, but if you look at any software engineering team, and open up our knowledge base, you will likely see a  postmortem for every outage and issue that happens, and runbooks to fix them.”

Arvind’s team has started to use the data platform for advanced data products, like churn prediction and machine learning. Envoy is trying Amazon SageMaker for that. 

Another project is integrating the data platform with the customer-facing analytics. “Obviously, when people log into Envoy, there are already analytics. How many visitors are coming to their office, how many packages they’re scanning. That’s done through a white-labeled solution with a BI vendor”, says Arvind.

“But eventually the data team is going to build an internal microservice where the Envoy app will just request what it wants and then we will return a bunch of things that we think are valuable”.