Start Now Login
Announcing App Tracing – Monitoring Your Data Apps With intermix.io

Announcing App Tracing – Monitoring Your Data Apps With intermix.io

App Tracing surfaces important information about how apps & users interact with your data. It can help answer questions like:

What is a “Data App”?

There are three categories of data apps:

  1. Data integration services. Vendors who ETL data from external systems or applications into your data environment.
  2. Workflow orchestration. Tools for workflow orchestration – typically batch processing on your data pipeline.
  3. Visualization & Analysis. Reporting, modeling and visualization apps used by analysts and data scientists.

How it Works

App Tracing requires the data app to annotate the executed SQL with a comment. The comment encodes metadata about the application which submitted this query.

Intermix.io will automatically index all data contained in the annotation, and make it accessible as first-class labels in our system. I.e. for Discover searches, Saved Searches, and aggregations in the Throughput Analysis page.

Supported Apps

Out of the box, we support:

Don’t see your data app? No problem. Any queries tagged with our format will be automatically detected. See here for instructions on using the Tag Generator to create tags to embed into your SQL.

Example

Which Looker user is causing a concurrency spike

In the below example, a query spike in WLM 3 causes a bottleneck in query latency. The result is that queries which would otherwise take 13-14 seconds to execute, are stuck in the queue for > 3 minutes.

App Tracing detects that the majority of these queries are from Looker. How do you know which user is causing this?

Click on the chart, and a widget will pinpoint the specific Looker user(s) who ran those queries. In this example, we see that user 248 is responsible.

app_tracing_looker

Armed with this information, you can now:

Monitoring & Setting an Alarm

See all the activity for this user by heading to Discover and use the new ‘App’ filter to search for Looker user 248.

To set up an alarm to get email notifications, save that search and stream the following metrics to CloudWatch:

app_tracing_alerts_cloudwatch

See What Customers are Saying

We soft-launched app tracing on the morning of this blog post. It didn’t take our customer long to notice. See this screenshot of Slack conversation (each one of our customers has a direct line to our team) we had today.

comment_app_tracing

Using Apache Airflow?

If you’re using Amazon Redshift in combination with Apache Airflow, and you’re trying to monitor your DAGs  – we’d love to talk! We’re running a private beta for a new Airflow plug-in with a few select customers. Go ahead and and click on the chat widget on the bottom right of this window. Answer three simple questions, schedule a call, and then mention “Airflow” at the end and we’ll get you set up! As a bonus, we’ll throw in an extended trial of 4 weeks instead of 2! 


Photo by Denise Johnson

Related content
3 Things to Avoid When Setting Up an Amazon Redshift Cluster Apache Spark vs. Amazon Redshift: Which is better for big data? Amazon Redshift Spectrum: Diving into the Data Lake! What Causes "Serializable Isolation Violation Errors" in Amazon Redshift? A Quick Guide to Using Short Query Acceleration and WLM for Amazon Redshift for Faster Queries What is TensorFlow? An Intro to The Most Popular Machine Learning Framework Titans of Data with Mirko Novakovic - How Containers are Giving Rise to New Data Services Why We Built intermix.io - “APM for Data” 4 Simple Steps To Set-up Your WLM in Amazon Redshift For Better Workload Scalability World-class Data Engineering with Amazon Redshift - Training Have Your Postgres Cake with Amazon Redshift and eat it, too. 4 Real World Use Cases for Amazon Redshift 3 Steps for Fixing Slow Looker Dashboards with Amazon Redshift Zero Downtime Elasticsearch Migrations Titans of Data with Florian Leibert – CEO Mesosphere Improve Amazon Redshift COPY performance:  Don’t ANALYZE on every COPY Building a Better Data Pipeline - The Importance of Being Idempotent The Future of Machine Learning in the Browser with TensorFlow.js Gradient Boosting Libraries — A Comparison Crowdsourcing Weather Data With Amazon Redshift The Future of Apache Airflow Announcing Query Groups – Intelligent Query Classification Top 14 Performance Tuning Techniques for Amazon Redshift Product Update: An Easy Way To Find The Cause of Disk Usage Spikes in Amazon Redshift How We Reduced Our Amazon Redshift Cost by 28%
Ready to start seeing into your data infrastructure?
Get started with a 14-day free trial, with access to the full platform

No Credit Card Required