Start Now Login

intermix Blog

Best practices and lessons learned for cloud ETL and data engineering.

The Future of Apache Airflow

5 min READ
August 20th 2018
This is a guest blog post by Pete DeJoy. Pete is a Product Specialist at Astronomer, where he helps companies adopt Airflow.  Apache Airflow has come a long way since it was first started as an internal project within Airbnb back in 2014 thanks to the core contributors’ fantastic work in creating a very engaged […]
Pete DeJoy Pete DeJoy

Announcing Query Groups – Intelligent Query Classification

1 min READ
August 10th 2018
Query Groups is a powerful feature which intelligently classifies and ranks query workloads on your cluster. Query Groups can answer questions like: my cluster just experienced a sudden increase in latency – which queries are causing this? which queries are consuming the most amount of cluster resources? which queries are slowly increasing in latency? Have […]
Paul Lappas Paul Lappas

Crowdsourcing Weather Data With Amazon Redshift

2 min READ
August 6th 2018
Have you ever tried setting up a personal weather station?  Collecting digital weather data in your backyard or on your rooftop has recently become an easy thing to do. Sharing it is easy also: the Citizen Weather Observer Program (CWOP) will transmit your backyard data to NOAA to help with forecasts.  All you have to […]
Lucy Hancock Lucy Hancock

Building a Better Data Pipeline - The Importance of Being Idempotent

4 min READ
July 30th 2018
Introduction At a glance, batch data processing seems simple. Pull data from a source, apply some business logic to it, and load it for later use. When done well, automating these jobs is a huge win. It saves time and empowers decision-makers with fresh and accurate data. But this kind of ETL (Extract, Transform, and […]
Cody Hanson Cody Hanson

Improve Amazon Redshift COPY performance:  Don’t ANALYZE on every COPY

1 min READ
July 24th 2018
Introduction One of the core challenges of using any data warehouse is the process of moving data to a place where the data can be queried. Amazon Redshift provides two methods to access data: 1- copy data into Redshift local storage by using the COPY command 2- use Amazon Redshift Spectrum to query S3 data […]
Paul Lappas Paul Lappas
1 2 3 4 5