Modern ETL Tools for Amazon Redshift

Amazon Redshift was introduced in 2012 as the first Cloud Data Warehouse. It quickly became the fastest-growing service among Amazon Web Services. Amazon Redshift and these ETL tools forever changed the methods of working with analytical data. The focus shifted from the classical ETL approach to ELT. Let’s examine in detail what’s behind these mysterious abbreviations and why it is so important.

ETL stands for: 

  • Extract data from the source system
  • Transform the data into a format suitable for analysis
  • Load the data into the warehouse.

The standard analytics stack in the “old” on-premise world would look like this:

  • Some sort of Hadoop cluster for distributed data processing and storage
  • A data integration tool like Informatica that would take the data from Hadoop, aggregate it and transform it
  • Push the aggregated data from Informatica into an on-premise warehouse, like Teradata or Oracle.

Once it’s in the data warehouse, data doesn’t change at all anymore. It’s absolutely static. 

ELT Tools for Amazon Redshift

“ETL” is an “old world” paradigm. With Amazon Redshift, cloud warehouses became very powerful and cheap. So there was no more reason to process the data outside of the warehouse. Just load your raw data into the warehouse and then run aggregations/transformations within your warehouse. This explains the shift from “ETL” to “ELT”. 

a shift to a new data stack

The new generation of tools follows the “ELT” paradigm. They integrate many different data sources and then push the raw data into the warehouse. Let’s review the top modern ETL tools for Amazon Redshift in more detail.

Fivetran

Fivetran architecture
Image source: https://fivetran.com/docs/getting-started/architecture 

Fivetran is an easy to use cloud service. It can help you gather all your data into your warehouse in minutes without any coding. Fivetran supports a wide range of data sources.

Fivetran automatically cleans and normalizes your data. It organizes it into standardized schemas and then routes to a warehouse. Once the data from the connected data source lands in your warehouse, you can apply custom transformations to data via SQL.

Fivetran is a scalable and secure solution that meets PCI, HIPAA, GDPR and SOC 2 criteria with advanced security features like SSH tunnels.

fivetran ranking

SimilarWeb Category Rank: 15,200

Pricing: Pricing available on request.

Alooma

alooma dashboard
Image source: https://www.alooma.com/ 

Alooma is an enterprise data pipeline platform, recently acquired by Google. Alooma is scalable and secure cloud-based ETL tool. It provides dozens of native integrations with various data sources.

One of the unique features of Alooma is Code Engine. This enables you to enrich and customize data with Python scripts.

Alooma Mapper provides automatic data mapping from any source to a specified destination. When an automatic mapping is not enough, you can make custom mappings in a simple and intuitive UI.

However, with the Google acquisition, Alooma has announced that they will end support for Amazon Redshift in 2020.

alooma ranking

SimilarWeb Category Rank: 7,306

Pricing: Pricing available on request.

Stitch

stitch dashboard
Image source: https://www.stitchdata.com/docs/getting-started/set-up-stitch-data-pipeline 

Stitch is a developer-focused tool for rapidly moving data. Stitch is built on top of Singer open source ETL framework. It provides a rich collection of community-driven integrations. If you can’t find the necessary integration, you can always create it yourself. Otherwise, order its development from the implementation partners. Stitch provides rich customization options for replication, enterprise-grade security, error handling, and advanced reports.

Stitch can be difficult to learn for non-technical users. But high-quality documentation and almost unlimited platform extensibility compensate this drawback.

SimilarWeb Category Rank: 4,906

Pricing: From free plan (including 5 million rows per month and selected free integrations) to $1250 per month.

Talend 

talend dashboard

Talend Cloud Integration is a feature rich solution synchronizing your data between any SaaS and on-premises applications quickly. It also provides cloud-based data preparation tools to access, clean, transform, and enrich data easily. 

Talend allows easy design of cloud-to-cloud and hybrid integration workflows. It automates provisioning of your warehouse. Talend Cloud Integration provides powerful features for data integration, big data integration, and application integration. 

Talend was named a Leader in the 2019 Gartner Magic Quadrant for Data Integration Tools.

talend ranking

SimilarWeb Category Rank: 1,567

Pricing: Pricing available on request.

Blendo

blendo dashboard
Image source: https://app.blendo.co/ 

Blendo is a simple ETL and ELT data integration tool with user-friendly web interface built especially for non-technical users. Just choose one from dozens ready-made data connectors and create ELT data integrations from almost any data source. Blendo will collect, detect, and optimize data according to your data warehouse. 

Blendo lacks some database connectors, but the service is actively developing and constantly adding new data sources.

blendo ranking

SimilarWeb Category Rank: 13,890

Pricing: From $125 per month for the standard package to $1,000 per month for the advanced package.

Hevo Data

hevodata dashboard
Image source: https://hevodata.com/blog/building-data-pipelines-hevo/ 

Hevo is a data integration platform that allows you to quickly set up your warehouse. It brings data from any source to any destination without writing any code. Hevo provides user-friendly point and click Interface to set up data integration logic in minutes. 

Hevo supports hundreds of data sources and destinations. They can be integrated in real time with Hevo’s Real-time Streaming Architecture.

For advanced data processing and enrichment, Hevo allows writing custom data transformation scripts in Python.

Hevo provides automatic schema detection and mapping. With a robust Schema Registry Module, Hevo intelligently detects the schema changes of incoming data. It notifies you immediately with Real-time Alerts over email or Slack. 

For advanced users, Hevo also provides granular activity logs, version control. It has Extensive APIs to manage data pipelines programmatically.

SimilarWeb Category Rank: 14,248

hevodata ranking

Pricing: From $499 per month for the Starter plan to $999 per month for the Business plan. Enterprise plan price is available on request.

ETLeap

ETL Leap Dashboard
Image source: https://etleap.com/product/ 

ETLeap is an ETL tool for engineering, analytics, and data science teams. It helps build data pipelines and data warehouses without friction.

With ETLeap you can integrate with any of 50+ supported data sources. ETLeap will monitor and maintain your data pipelines for availability and completeness.

You can control your data pipelines over the intuitive dashboard. The interactive data wrangler automatically guesses how to parse incoming data and enables adjustments. Optionally, you can apply custom data transformations with scripts in any language.

ETLeap provides a secure and flexible ETL service. ETLeap is HIPAA-compliant and EU-US privacy shield certified. It provides end-to-end encryption and single sign-on support. You can run it as a hosted solution (SaaS) or in your VPC. 

SimilarWeb Category Rank: 71,974 

etleap ranking

Pricing: Pricing available on request.

Data Virtuality

Data Virtuality dashboard
Image source: https://datavirtuality.com/connectors/ 

Data Virtuality is an advanced data integration solution with user-friendly UI. It centralizes data from multiple sources. 

Data Virtuality supports 200+ data sources. It offers a virtual data layer for modeling and transforming data with SQL on the fly.

Data Virtuality enables configuring replication settings flexibly. It has pre-built templates for retrieving data and customizable data pipelines. It can be hosted either in the cloud or on-premise. You can ensure data is optimally integrated with distributed join optimizations, dynamic cost, and rule-based optimizations.

Data Virtuality ranking

SimilarWeb Category Rank: 22,810

Pricing: From $249 per month for the Starter bundle to $599 per month for the Large bundle. Enterprise bundles price is available on request.

FlyData

flydata dashboard

FlyData allows you to replicate data changes in your relational database or csv/json files directly to Amazon Redshift.

Flydata provides a limited list of data sources, but syncs them in real time right into Amazon Redshift. Flydata ensures 100% accuracy with each load, so your data is always up-to-date. 

FlyData offers SecureTunnel VPN to securely access your database behind a firewall. FlyData provides error handling system, optimized specifically for Amazon Redshift. 

flydata ranking

SimilarWeb Category Rank: 22,312

Pricing: From $199 per month to $4667 per month.

Informatica

Informatica Cloud provides codeless, optimized integration to hundreds of applications and data sources. Both on-premises and on cloud. Informatica was named a Leader by Gartner in several markets. Including Magic Quadrant for Enterprise Integration Platform as a Service, Magic Quadrant for Data Integration Tools, Magic Quadrant for Data Quality Tools.

Informatica Cloud has everything you may need for data integration projects and initiatives. Informatica Cloud provides several hundred connectors to databases, cloud data lakes, on-premises and SaaS applications.

Informatica Cloud allows you to build advanced integrations quickly and run them at scale. You can use recommendations for automated parsing driven by the CLAIRE™ engine. Also, build complex data integrations using mapping designer with out-of-the-box advanced data integration transformations. 

Since this is a large product suite, its initial set up can be difficult. It will take a lot of time to learn how to use all its capabilities to its full potential. 

informatica ranking

SimilarWeb Category Rank: 1,165

Pricing: Informatica provides various pricing plans starting at $2,000/month.

Matillion

matillion dashboard
Image source: https://www.matillion.com/events/etl-vs-elt-whats-big-difference/ 

Matillion is a powerful and easy-to-use cloud ETL/ELT solution. It has a huge selection of pre-built connectors out-of-the-box. It provides an intuitive interface to visually orchestrate sophisticated data workflows.

A simple and intuitive UI allows you to get started quickly. But Matillion allows more than this. It offers many advanced functions. Dozens of Transformation and Orchestration Components help perform advanced data transformations. You can orchestrate your ETL processes from start to finish in a graphical user interface.

Matillion provides Generic API connector integrating with almost any JSON or XML based API. The component converts the API into a pseudo SQL dialect. This can then be queried in a similar way to any normal database. 

Ready-made components are most convenient. With Matillion you can also create your own Python or Bash script to transform and enrich data.

matillion ranking

SimilarWeb Category Rank: 7,216

Pricing: Pricing on Matillion depends on instance size, from “Medium” at $1.37 per hour, to “XLarge” at $5.48 per hour.

Conclusion

Modern ETL tools have a lot to offer. They benefit users with completely different backgrounds and needs. You can find a simple and easy-to-use solution for quickly integrating data from several sources. You can also create a complex data pipeline. The ability to enrich data using Python scripts is essential. One of the services in this review will likely offer a solution for your data pipelines.

Igor Bobriakov

Igor Bobriakov

Join 11,000 of your peers.
Subscribe to our newsletter SF Data.
People at Facebook, Amazon and Uber read it every week.

Every Monday morning we'll send you a roundup of the best content from intermix.io and around the web. Make sure you're ready for the week! See all issues.