Announcing Query Groups – Intelligent Query Classification
Query Groups is a powerful feature which intelligently classifies and ranks query workloads on your cluster. Query Groups can answer questions like:
- my cluster just experienced a sudden increase in latency – which queries are causing this?
- which queries are consuming the most amount of cluster resources?
- which queries are slowly increasing in latency? Have the highest queue time?
How it Works
Queries are grouped together using a proprietary algorithm, and ranked by volume, execution time, and queue time. More metrics will be added in the future.
All queries in a “query group” share SQL structure and operate on the same tables.
Example – Find the queries causing a query spike
At 8:17am on Aug. 7 the below cluster experienced an 8x spike in queries. Typically, this type of event is caused by a handful of new queries which suddenly increased their volume. How do you find which queries? Who ran them?
Query groups can quickly determine which queries are responsible.
Click on the new Query Groups page in the left nav. Groups are sorted by Rank by default. In this case, we want to re-sort by “Rank Change”. Sorting by rank change will order the list of query groups by the ‘fastest movers’. So you’ll quickly see the groups which moved up the ranks in the past week.
Sure enough, we see a handful of query groups which suddenly started running. Clicking into the first one, we can isolate the exact queries.
The same procedure could be used to determine the queries that spike in latency or queue time.
We will expand the ‘grouping’ concept in the future to add:
- Group by App. e.g. rank queries by Looker user, Chartio dashboard, or Airflow Task.
- Metric Stream support (setting up notification and email alerts)
- New rank metrics: data transfer, disk-based queries & aborted queries