Start Now Login

Introduction

The AWS team recently released a new feature of their Automatic WLM that we’re really excited about: Query Priorities. This feature aims to address the main limitation of Auto WLM head-on, i.e. it allows you to prioritize some queries over other queries. This is a crucial performance enhancement that is needed to achieve Data SLAs.

We first covered Redshift’s new Automatic WLM feature on our blog before Query Priority was fully released, and found mixed results:  it was highly effective (maybe too effective!) at reducing the percentage of disk-based queries, but had the side effect of increasing overall queuing on our clusters since big queries consumed most of the memory for the cluster.  For our use-case, this tradeoff didn’t work–the additional queueing caused unacceptable delays to our data pipeline. Since writing that blog post, we ended up reverting all of our clusters to our well-tuned Manual WLM.

We’ve already covered the basics of setting up Automatic WLM with Query Priorities from a very high-level, and in this post are going to give a first look at the performance we’re seeing with this new feature.

Automatic WLM: the basics

As a reminder, Automatic WLM dynamically assigns cluster resources to queries as they run.  This is in contrast to Manual WLM, where you need to manually choose a memory and concurrency value for each queue that queries run in.  The benefit of having Automatic WLM choose these values for you is less about making it easier (optimally configuring Manual WLM is pretty easy), and more about making it dynamic–in theory each query should get the right amount of cluster resources, no more, no less.  This should lead to a more optimal use of your cluster, resulting in a higher query throughput and less wasted memory.

In practice however, Automatic WLM (without Query Priority) has no way of knowing the impact additional queue time will have on your data SLAs and hence your business.  So the net result of Automatic WLM on your data SLA might be negative, even if the cluster’s use of resources is more efficient by some measure.  

This is exactly the effect we saw in our earlier post:  long-running queries ran up to 3x faster (and fewer went to disk), since big queries were allocated more resources.  But this came at the expense of additional queue time for queries overall. The net result was a decrease in overall throughput in our data pipeline.  Manual WLM allows us to tune that tradeoff, letting some queries go disk-based if it means keeping the overall latency of our pipeline below our data SLA.

Automatic WLM with Query Priority

Automatic WLM with Query Priorities intends to solve this exact problem:  they allow you to let Amazon Redshift know how to prioritize queries when allocating cluster resources.  Query Priorities are managed via Automatic WLM query queues, with each queue getting a priority:

As with Manual WLM, queries are assigned to queues using either database user groups or “query groups” (a label you can set in the database session using the SET query_group TO statement before you submit a query).  The idea is that you simply rank your workloads in terms of priority, and Redshift handles allocating cluster resources optimally.

Redshift uses these query priorities in three ways:

  1. When queries are submitted to the cluster, Redshift uses the priority to decide which queries should run and which should queue.  Queries in lower priority queues will still run, but will queue longer on average than queries in higher priority queues.
  2. When queries run, higher priority queries are allocated more resources (including memory, CPU, and I/O throughput) than lower priority queries.  Lower priority queries get all of the system resources when there are no higher priority queries.
  3. Queries in lower priority queues may get aborted (and requeued) more often to make room for queries in higher priority queues.  This occurs only when a higher priority query arrives.

Setting up our first cluster

Following the guidelines in our Automatic WLM configuration post, we looked at one of our development Redshift clusters and first enumerated the workloads we have running.  We then grouped their users into groups based on workload:

WorkloadDatabase User Group
Automated queries from other parts of our systemintermix_app
Data loads and Transformationsintermix_write
Validation testsintermix_readonly
Unloadsintermix_unload
Maintenance queriesintermix_maintenance
DBLink queriesintermix_dblink
Developer Ad-hoc queriesintermix

Next, we assigned a priority to each workload and decided if any should have Concurrency Scaling enabled:

WorkloadDatabase User GroupPriorityConcurrency Scaling
Data loads and Transformationsintermix_writeHighestNo
Automated queries from other parts of our systemintermix_appHighYes
Validation testsintermix_readonlyNormalNo
Unloadsintermix_unloadLowNo
Maintenance queriesintermix_maintenanceLowestNo
DBLink queriesintermix_dblinkHighYes
Developer Ad-hoc queriesintermixLowYes

Finally, we rebooted and started checking our Intermix.io dashboard for results.

Results

At a high-level, our results looked markedly improved compared to our previous Automatic WLM tests.  We saw a significant improvement in average execution time (light blue) accompanied by a corresponding increase in average queue time (dark blue): 

latency and count trends

Overall, the net result of this was a small (14%) decline in overall query throughput.

As before, these changes were driven by a dramatic decrease in execution times for our slow, disk-based queries:

latency and count trends

Next, we looked at a breakdown of execution time by workload using the Insights feature of our dashboard:

execution time

This again showed a dramatic improvement in average execution times, though the overall count of queries (i.e. query throughput) went down slightly:

app queries intermix.io

We also saw an increase in queue times for all users, which is expected given that Redshift must queue lower priority queries in order to allocate resources for higher priority queries:

queue time

Finally, we saw an expected increase in aborted queries for lower priority workloads (which includes queries that are evicted and re-queued to make room for higher priority queries):

aborted queries

Our Interpretation

We suspect that the reason we didn’t see a significant improvement when using Automatic WLM with Query Priority is due to the fact that our cluster workloads are primarily dominated by a single very resource-intensive batch processing pipeline (executed by the intermix_write group).  Once data is processed and reduced by this workload, it gets offloaded into other databases for serving (either Elasticsearch or RDS via DBLink).  So while it is the case that our clusters do run a mix of workloads, the fact that the bulk of the work is dominated by this single resource-intensive transformation pipeline means that there may simply not be enough wiggle-room in terms of workloads to move around to give Redshift enough parameters to optimize.

Further, with Manual WLM we’re able to tune our clusters for optimal performance given this dominant transformation workload, which itself is a complex mix of queries.  Our data SLA metric isn’t simply “minimize disk-based queries” or “maximize query execution time for this group of queries”–it is “decrease the final latency at the end of this complex data pipeline”.

Finally, since our workload is dominated by write queries, we don’t get much gain from enabling concurrency scaling or SQA to offload read-only queries.

Conclusions and Recommendations

Despite the fact that Automatic WLM was slightly slower than our Manual WLM configuration, we think that it is the right way to approach WLM:  it allows you as a Redshift Administrator to prioritize workloads to align with your business needs, as opposed to tuning lower-level knobs (concurrency and memory) to achieve the same result.  And as the AWS Redshift team improves their WLM tuning algorithms, we expect that the performance of Automatic WLM will continue to improve.

We recommend that most Redshift admins try Automatic WLM, especially if any of the following are true:

If you do use Automatic WLM, we strongly recommend you enable Query Priorities if you have any workloads for which consistent performance is important (and who doesn’t?)  We also recommend enabling SQA for your cluster, and enabling Concurrency Scaling for any Priority Queues which run eligible queries.  Both of these will reduce and maybe even eliminate queuing for those workloads, especially during times of increased query volume (think BI tools where your users are querying your warehouse in real-time).

We’d love to hear your thoughts and if Automatic WLM with Query Priorities has worked for you–reach out to us in our Redshift community slack channel and share your experience with the community!

Earlier this year we launched an update showing query recommendations on how to improve SQL queries to enhance performance. In most cases, improving query performance requires re-writing the SQL and re-submitting the query. 

We got a lot of great feedback from teams that used the recommendations to improve queries. Two requests, in particular, stood out:

  1. Help me prioritize the recommendations. Which ones are more important? Which one should I do first?
  2. Where in the query/SQL text should I look? If we are giving you a recommendation on a JOIN and I have 3 JOINs – which one is the problematic one?

Our team has been busy working to answer these questions. 

We’re pleased to announce a preview of a new enhancement to Query Details which surfaces a graph of the query execution. The purpose of the graph is to visually pinpoint which portion of the query execution was the bottleneck and took the most amount of time.

Armed with this data you can go back to your SQL and know exactly where to start looking.

Table of Contents

Sample Query Execution Graph

Sample Query Execution Graph

The size of the nodes indicated elapsed time: the bigger the node, the more time was spent on that portion of the query execution.

Nodes may represent any of the following things:

Clicking on any node will open detailed metrics including memory used, rows and data processed, skew, and other useful information on what happened in that operation.

Existing Customer Preview

We’re opening up the feature to existing customers to get feedback.  (We plan to release the feature to customers on our ‘Growth’ plan later this year.)

It’s easy to request a graph and we highly encourage it. Simply find a query (the bigger the better) you’d like to see a graph for and send us a request. We’ll reply with a URL that will contain an interactive graph for your query.

There are two ways to make a request:

  1. Message the #intermix channel on our community slack http://slack.intermix.io (make sure to include the query ID, or simply paste in the URL of the query details page.)
  2. Click the blue banner on the top of the query details page – this action will generate a request to our team with all the necessary details.
query recommendations in intermix.io product

Not an Existing Customer?

No problem – if you’re not an existing customer but want to see a graph for one of your queries, simply sign up for a two-week free trial (no credit card required) and request one during your trial.

Properly managing storage utilization is critical to performance and optimizing the cost of your Amazon Redshift cluster. We’ve talked before about how important it is to keep an eye on your disk-based queries, and in this post we’ll discuss in more detail the ways in which Amazon Redshift uses the disk when executing queries, and what this means for query performance.

Amazon Redshift uses storage in two ways during query execution:

  1. Disk-based Queries. When a query runs out of memory, the overflow “spills” to the disk and the query goes “disk-based”.
  2. Intermediate Storage. When a query needs to save the results of an intermediate operation, to use as input for a future operation.

Use excessive storage impacts your cluster because:

  1. Query performance suffers, because disk is 100 times slower than memory.
  2. Other queries will be slower, because increased I/O impacts the commit queue which is a shared resource.
  3. The cluster might hit 100% disk utilization, causing queries to fail or requiring the provisioning of more nodes.

The worst case is (3), when the cluster fills up. It can happen for the all nodes in a cluster at once, or start with just one node, and then propagate through the entire cluster, as the image below shows. 

disk utilization by node, Disk-based Queries

So let’s look into what we can do to fix and prevent this type of situation. 

The WLM and Disk-Based queries

If you’re not already familiar with how Redshift allocates memory for queries, you should first read through our article on configuring your WLM

The gist is that Redshift allows you to set the amount of memory that every query should have available when it runs. This value is defined by allocating a percentage of memory to each WLM queue, which is then split evenly among the number of concurrency slots you define.  When a query executes, it is allocated the resulting amount of memory, regardless of whether it needs more (or less). Hence allocating too much memory is wasteful (since each node in the cluster obviously has finite memory), whereas allocating too little memory can cause queries to spill to disk.

Queries which overflow their allocated WLM memory are “disk-based”. These queries usually suffer from significantly degraded performance since disk I/O is orders of magnitude slower than memory I/O. 

There are six types of internal operations that Redshift can spill to disk when executing a query:

  1. Aggregations
  2. Hashing for joins
  3. Saving intermediate rows for future query steps
  4. Sorting
  5. Removing duplicates from intermediate or final results (unique)
  6. Window functions

If any of these operations are processing more rows (i.e. more bytes) than will fit into allocated memory, Redshift has to start swapping data out to disk, resulting in a significant slowdown of the query.

AWS recommends that you keep the percentage of disk-based queries to under 10%.  On our own fleet of clusters, we’re usually running well under one percent:

disk-based queries, Disk-based Queries

How to Prevent Queries From Going Disk-based

  1. Allocate more memory to the query. There are two approaches to this:

Within the intermix.io dashboard, viewing Recommendations for an individual query will surface the exact touched tables and how to update them, as well as how much memory the query used and the amount of memory capacity in the WLM queue.

Disk-based Queries

Intermediate Storage

Redshift also uses the disks in each node for another type of temporary query data called “Intermediate Storage”, which is conceptually unrelated to the temporary storage used when disk-based queries spill over their memory allocation.  

Intermediate Storage is used when Redshift saves the results of an intermediate operation to disk to use as input for a future operation. Intermediate Storage can become important if your query stores a large amount of data between query operations, since that storage may cause your cluster to run out of disk space. It also introduces additional I/O, which can lead to slower execution times.

Ways to Limit the Amount of Intermediate Storage Used.

Since intermediate storage is used to carry results from one part of the query execution to another, the best way to reduce intermediate storage is to use predicates (e.g. WHERE clauses, JOIN … ON clauses, etc) on intermediate steps of your query (subqueries, CTEs, etc) to ensure that you are not carrying unnecessary data through your query processing.

For example, consider this query which joins on the results of two CTEs:

This query could be re-written as follows to limit the amount of data brought forth into the JOIN.

We have a great new feature called Query Recommendations that proactively lets you know if your query is using a significant amount of intermediate storage.

query recommendations

Impact to Cluster Storage Utilization

This value is important when diagnosing spikes in cluster storage utilization. 

For example, let’s assume you see your cluster storage spiking over some time period, but don’t see a corresponding increase in data transferred (via the COPY command) into your cluster. You could search for all queries which have a large Memory to Disk value to identify which queries contributed to your cluster running out of disk space.

Here’s a real-world example. The following chart shows the actual disk space used in a cluster, over a 2 week period, broken down by schema. This particular chart show consistent storage utilization over time, with small variation.

schema size, Disk-based Queries

The chart of % disk utilization tells a different story. On the same cluster, over the same period, the disk utilization hits 100% quite frequently. This is caused by some queries using an extraordinary amount of intermediate storage.

disk utilization

One of the cool features we recently released Cluster Recommendations, will surface queries with high disk utilization immediately. For this cluster, it appears that queries were using close to two terabytes of disk for intermediate results!

cluster recommendations

Conclusion

Monitoring both “Disk-based Queries” and “Intermediate Storage” is crucial to keeping your cluster healthy.  Keeping on top of this temporary disk utilization prevents your Amazon Redshift disks from filling up due to misbehaved queries, resulting in queries being killed and your users being interrupted.

One of the things we hear most often from our customers is that data is becoming more and more critical to their business necessitating Data SLAs.

Take for example one customer who builds hardware devices for consumers. They use their Amazon Redshift cluster to monitor and provide support for customers on the health of those devices (billions of events per day) and also to manage the complex supply chain for their overseas production lines.

This company relies on the data warehouse to run their business, and if the data pipelines don’t complete in time, or queries are slow or fail – the business loses revenue and customer satisfaction plummets.

We see this trend across all of our customers. In response – businesses are beginning to demand service level agreements (SLAs) for queries and reports built on top of the data warehouse. These SLAs have a number of different service level indicators – the most prominent are:

In fact, we have one customer that demands a 10 second SLA for any query that is executed through their BI reporting tool. Analysts get fed up with waiting for the “spinning wheel of death” waiting for a dashboard to load, or a query to complete.

Measuring & Improving Data SLAs for Query Latency

Measuring service levels for query latency is trickier than it seems. To understand why, consider the sheer scale of workloads that a typical data warehouse handles. Our average customers runs over one million queries per week on their Amazon Redshift cluster. These queries span three categories of applications

At the same time, each category of workload is growing:

So a single SLA across all the queries is meaningless. What you really need is to measure SLA on a very granular basis – for each data application, down to the individual user level. For a real world example, consider a company using Looker as their BI application. How do you easily filter out all the noise only to see the Looker queries? How do you monitor the performance of a single model?

Measuring SLAs is hard, but is only the beginning of the journey. Assuming you are able to measure the SLA from each of your data applications – the next step is to take steps to improve the SLA by finding and fixing issues impacting your workloads and users.

Introducing the Cluster Health Dashboard

The Cluster Health Dashboard helps data teams measure and improve SLAs. It does this by surfacing:

  1. SLA measures for any connected app, down to the individual users
  2. Proactively surfaces Recommendations to improve performance and optimize costs
  3. Top 5 slowest queries with quick links to Query Optimization Recommendations
Data SLA, dashboard

Measure Data SLAs for any App or Users

To see latency distribution for any connected app (see here for a list of supported apps), simply modify the filter at the top-right of the screen.

Query latency follows a power-law distribution – and most queries in the warehouse run rather quickly, with a long tail of slow queries.

Measuring query latency is best done by looking at different points across that distribution – the most common being the P95 and P99 latencies. These are shown prominently at the top of the page.

Further, we show you a distribution of latencies across the entire cluster, grouped by user (by default). Click on the chart to quickly go to the queries on the Discover page.

Example of SLA metrics for “Looker PDT” queries.

data latency distribution

Cluster Recommendations

Our team has significant experience running Amazon Redshift at scale. In fact, we run a very large fleet of Amazon Redshift clusters as part of our product architecture. Based on this experience, we surface observation and recommendations on how to optimize the cost and performance of the cluster. The Recommendations cover things like:

evaluate sql queries

Find & Fix Slow Queries

The “Top 5 Longest Running Queries” list default to showing you the slowest queries in your cluster, over the past day.

Data SLA, query insight

Click on any of these queries, and go to a detail showing recommendations to optimize the query.

Query Insights

Query optimization that dramatically reduces runtime for queries which use window functions.

The Simplified Problem

A common problem we see our customers solving in SQL is how to compare two rows per group in a dataset.  For example, say you have the following data in the product_prices table:

Productupdated_onPrice
widget_A2019-07-03 12:00:00$105
widget_A2019-07-03 11:00:00$110
widget_A2019-07-03 10:00:00$100
widget_B2019-07-02 06:00:00$255
widget_B2019-07-02 05:00:00$235

How do you efficiently SELECT the price change at each updated_on time for each Product?  I.e. we’d like to see a result set of:

ProductMost recent updated timePrice change
widget_A2019-07-03 12:00:00-$5
widget_A2019-07-03 11:00:00$10
widget_B2019-07-02 06:00:00$20

This type of problem arises in a lot of domains.  Another common example of this problem occurs when you want to compare the time difference between subsequent events. For example, you may have a table that logs timestamps for each page visit of a user to your website and you want to do an analysis on the time difference between visits.

A common but sub-optimal way we see customers solve this problem is by using the ROW_NUMBER() window function together with a self join.  The algorithm is straightforward: first select all your product prices and order them within each product by updated_on using the ROW_NUMBER() window function.  Then self join on the table, mapping each row to the row preceding it.  In SQL:

This query produces the desired result set, but at the cost of a join.  For real-world (i.e. less contrived) examples, solutions like this also often need to incorporate additional sub-selects or other expensive operations which compound the effect of the join and can make the query run excruciatingly slow.

This class of problem can manifest itself in other ways as well: sometimes instead of a self join, the query may just select from ordered_prices where row_number=1 to get the most recent updated_on and price.  But the general pattern is the same: using ROW_NUMBER() to label rows for further processing/joining.

A Better Solution

If you’re using ROW_NUMBER() in a query just to apply filters or joins later on in your transformation, there may be a better way.  Rather than co-opting ROW_NUMBER() to compare neighboring rows, a better solution is to apply a different window function that was designed to solve this problem:  LAG().  The LAG window function returns the value from a row offset from the current row.  The resulting SQL becomes:

This SQL says that we should order the rows within each product by updated_on, and take the difference of the price from the current row with the price of the following row within that group.  This eliminates the need for the join and can greatly simplify the query plan. One note: this result set will include NULL rows at the start of each product (i.e. where there is no previous row), but these can be trivially eliminated with a WHERE clause of price_change is not null when the results are used (or this query can be wrapped in a subselect to remove those rows).

A Real-World Example

The seed for this blog post was planted when a potential customer came to us with the same question we hear all the time:  “Why are our queries slow?”  After instrumenting their cluster with intermix.io they were able to quickly identify their worst queries and find out why they are slow and what to do about it.  In their case, one of the worst offending queries used this ROW_NUMBER() pattern and triggered a large number of Query Recommendations in their intermix.io Dashboard, including:

Query Recommendations block from intermix.io

Since the query exhibited the ROW_NUMBER() pattern above, it caused multiple unnecessary table scans, an expensive join, significant IO skew, and a large amount of intermediate data to be written to disk during execution.  The query operated over a single table which had 876 million rows–not a huge table, but large enough to bring Amazon Redshift to its knees if not optimized properly–and was taking over 5.5 hours to execute!

After reviewing our Recommendations, they reimplemented their query using the LAG function to eliminate the join (which greatly simplified their query by removing two additional sub-selects), and the query execution time dropped to 30 seconds.  From 5.5 hours to 30 seconds simply by reviewing targeted Query Recommendations and implementing a straightforward solution–talk about a good ROI on time spent optimizing SQL.  Their exact feedback after seeing the new query runtimes? “OMFG. takes 30s. Wow”

Lessons Learned

Amazon Redshift is a fantastic general-purpose data warehouse. But since it is so easy to add data and new workloads to your cluster, it can be very hard to know where and how to start optimizing SQL queries. 

Analysts and data engineers often wonder which queries are the worst offenders and where limited SQL tuning effort should go to give the biggest bang-for-the-buck. Knowing where to put query tuning effort is crucial since seemingly simple-looking queries can sometimes have inefficiencies lurking which can compound to slow a query by several orders of magnitude. 

The good news: this is the problem that intermix.io Query Recommendations was designed to solve. We help you pinpoint the root cause of query slowness so that you can spend your limited time optimizing the right queries in the right way.