Announcing App Tracing – Monitoring Your Data Apps With intermix.io




App Tracing surfaces important information about how apps & users interact with your data. It can help answer questions like:
- which user is responsible for this spike in concurrency?
- who is the most “expensive” Looker user?
- what is the average latency of a dashboard or model? Of all dashboards executed by a particular user?
- my Apache Airflow task latency is increasing or jobs are failing. What is causing that?
What is a “Data App”?
There are three categories of data apps:
- Data integration services. Vendors who ETL data from external systems or applications into your data environment.
- Workflow orchestration. Tools for workflow orchestration – typically batch processing on your data pipeline.
- Visualization & Analysis. Reporting, modeling and visualization apps used by analysts and data scientists.
How it Works
App Tracing requires the data app to annotate the executed SQL with a comment. The comment encodes metadata about the application which submitted this query.
Intermix.io will automatically index all data contained in the annotation, and make it accessible as first-class labels in our system. I.e. for Discover searches, Saved Searches, and aggregations in the Throughput Analysis page.
Supported Apps
Out of the box, we support:
- Looker
- Mode
- Periscope Data
- Chartio (coming soon!)
- Stitch Data (coming soon!)
- DBT
- Segment
- Apache Airflow (via a plugin)
Don’t see your data app? No problem. Any queries tagged with our format will be automatically detected. See here for instructions on using the Tag Generator to create tags to embed into your SQL.
Example
Which Looker user is causing a concurrency spike
In the below example, a query spike in WLM 3 causes a bottleneck in query latency. The result is that queries which would otherwise take 13-14 seconds to execute, are stuck in the queue for > 3 minutes.
App Tracing detects that the majority of these queries are from Looker. How do you know which user is causing this?
Click on the chart, and a widget will pinpoint the specific Looker user(s) who ran those queries. In this example, we see that user 248 is responsible.
Armed with this information, you can now:
- Ask the person why they are running so many queries 🙂
- Optimize the queries executed by this dashboard
- Increase the concurrency of the queue to reduce queue times
Monitoring & Setting an Alarm
See all the activity for this user by heading to Discover and use the new ‘App’ filter to search for Looker user 248.
To set up an alarm to get email notifications, save that search and stream the following metrics to CloudWatch:
- Query count
- Execution time & queue time
- # of rows scanned & memory consumed by queries run by this user
See What Customers are Saying
We soft-launched app tracing on the morning of this blog post. It didn’t take our customer long to notice. See this screenshot of Slack conversation (each one of our customers has a direct line to our team) we had today.
Using Apache Airflow?
If you’re using Amazon Redshift in combination with Apache Airflow, and you’re trying to monitor your DAGs – we’d love to talk! We’re running a private beta for a new Airflow plug-in with a few select customers. Go ahead and and click on the chat widget on the bottom right of this window. Answer three simple questions, schedule a call, and then mention “Airflow” at the end and we’ll get you set up! As a bonus, we’ll throw in an extended trial of 4 weeks instead of 2!
Photo by Denise Johnson