Start Now Login

intermix Blog

Best practices and lessons learned for cloud ETL and data engineering.

Crowdsourcing Weather Data With Amazon Redshift

2 min READ
August 6th 2018
Have you ever tried setting up a personal weather station?  Collecting digital weather data in your backyard or on your rooftop has recently become an easy thing to do. Sharing it is easy also: the Citizen Weather Observer Program (CWOP) will transmit your backyard data to NOAA to help with forecasts.  All you have to […]
Lucy Hancock Lucy Hancock

Building a Better Data Pipeline - The Importance of Being Idempotent

4 min READ
July 30th 2018
Introduction At a glance, batch data processing seems simple. Pull data from a source, apply some business logic to it, and load it for later use. When done well, automating these jobs is a huge win. It saves time and empowers decision-makers with fresh and accurate data. But this kind of ETL (Extract, Transform, and […]
Cody Hanson Cody Hanson

Gradient Boosting Libraries — A Comparison

7 min READ
July 26th 2018
Gradient Boosted Machines — A residuals’ worst enemy Gradient boosting is one of the most used machine learning algorithms in data science. Residual-based boosting thereby leads the lot in predictive accuracy and adaptability, making it a usual suspect among Kaggle winning solutions and also widely used in industry for problems across ranking, classification and regression […]
Tina Wenzel Tina Wenzel

The Future of Machine Learning in the Browser with TensorFlow.js

3 min READ
July 24th 2018
Up until now, I did most of my machine learning work in Python. With the recent release of TensorFlow.js – TensorFlow for JavaScript, I decided to spend the last couple of months trying out machine learning in the browser. I am well aware of an intensive relationship people have with JavaScript. But let’s be honest […]
Domas Bitvinskas Domas Bitvinskas

Improve Amazon Redshift COPY performance:  Don’t ANALYZE on every COPY

1 min READ
July 24th 2018
Introduction One of the core challenges of using any data warehouse is the process of moving data to a place where the data can be queried. Amazon Redshift provides two methods to access data: 1- copy data into Redshift local storage by using the COPY command 2- use Amazon Redshift Spectrum to query S3 data […]
Paul Lappas Paul Lappas
1 2 3 4 6