avatarInquisitive Intellect

Summary

This article provides a detailed walkthrough of the Amazon Aurora Zero-ETL integration with Redshift, explaining its benefits, considerations, and a step-by-step demonstration.

Abstract

The article discusses the traditional ETL (Extract, Transform, Load) processes and their challenges, such as system complexity, increased costs, and significant delays in analytics readiness. It then introduces the concept of Zero-ETL, which streamlines or eliminates the need for constructing ETL data pipelines, providing real-time or near-real-time data access. AWS supports Zero-ETL through various tools and features, including Amazon Aurora's Zero-ETL integration with Redshift. The article focuses on this integration, explaining its benefits, supported databases, considerations, and a step-by-step demonstration of setting up a Zero ETL integration between Amazon Aurora MySQL and Redshift Serverless.

Bullet points

  • Traditional ETL processes have historically been a bottleneck for data management due to their time-consuming development, maintenance, and scalability challenges.
  • Zero-ETL is an innovative approach that streamlines or entirely eliminates the need for constructing ETL data pipelines, revolutionizing how data integrations are handled.
  • AWS supports Zero-ETL efforts through various tools and features, including Amazon Aurora's Zero-ETL integration with Redshift.
  • The Amazon Aurora Zero-ETL integration with Redshift allows for seamless connectivity between RDS databases and an Amazon Redshift Data Warehouse.
  • The integration supports a limited selection of AWS databases, including Aurora MySQL Compatible Edition, Aurora PostgreSQL Compatible Edition (in preview), and RDS for MySQL (in preview).
  • There are several key considerations for both RDS and Redshift before setting up Zero-ETL Integrations, including the same AWS region, specific engine versions, and encryption settings.
  • The demonstration outlines the process for establishing a Zero ETL integration between Amazon Aurora MySQL and Redshift Serverless, involving several key steps such as creating a custom parameter group for the Aurora RDS, launching an Aurora MySQL instance, setting up a Redshift Serverless data warehouse, and implementing the Zero ETL integration.

AWS Zero ETL Between Aurora RDS and Redshift : Detailed Walkthrough

Overview

Traditional ETL (Extract, Transform, Load) processes have historically been a bottleneck for data management, characterized by their time-consuming development, maintenance, and scalability challenges.

Zero-ETL is an innovative approach that streamlines or entirely eliminates the need for constructing ETL data pipelines, revolutionizing how data integrations are handled. With Zero-ETL, the cumbersome tasks of extracting, transforming, and loading data are managed by AWS, automating the integration process and freeing up valuable resources.

In this post, I am going to provide a detailed walkthrough on the Amazon Aurora Zero-ETL integration with Redshift, supported with a step-by-step demonstration.

Background

ETL Challenges

Traditional ETL implementations come with their fair share of hurdles:

  • System complexity due to the intricacies of data integration, mapping, and inconsistencies.
  • Increased costs from duplicating data storage, scaling, and monitoring requirements.
  • Significant delays in analytics readiness caused by the need for custom code development, updates, and deployment.

Zero-ETL Benefits

Zero-ETL addresses these challenges head-on, offering numerous advantages:

  • It drastically reduces the complexity of data architectures, easing the burden on data engineering teams and enhancing operational agility.
  • Leveraging cloud-native technologies, Zero-ETL optimizes cost efficiency by minimizing infrastructure and maintenance expenses and reducing development time.
  • Most crucially, it provides real-time or near-real-time data access, a significant leap over traditional batch-oriented ETL processes.

Zero-ETL Support at AWS

AWS champions Zero-ETL efforts through a variety of tools and features, including:

  • Amazon Athena
  • Amazon Redshift Streaming Injection
  • Amazon Aurora’s Zero-ETL integration with Redshift
  • Amazon Redshift’s autocopy feature with S3
  • Enhanced data sharing and access control via Lake Formation

As discussed earlier, in this article, we will be focussing on the “Amazon Aurora Zero-ETL integration with Redshift”.

Amazon Aurora Zero-ETL Integration with Redshift

This integration allows for seamless connectivity between your RDS databases and an Amazon Redshift Data Warehouse. Defined by its source (RDS database) and target (Redshift Data Warehouse), the Zero-ETL integration facilitates instant data replication from the source to the target, ensuring your data is always up-to-date.

Supported Databases

Currently, Zero-ETL integration supports a limited selection of AWS databases, including:

  • Aurora MySQL Compatible Edition
  • Aurora PostgreSQL Compatible Edition (in preview)
  • RDS for MySQL (in preview)

It’s important to note that Aurora PostgreSQL and RDS for MySQL are in a preview state, hence not widely available yet. On the target side, there are no such limitations, with both Redshift Provisioned Clusters and Redshift Serverless being supported.

Considerations for Integration

Before setting up Zero-ETL Integrations, there are several key considerations for both RDS and Redshift:

For RDS:

  • The source DB cluster must be in the same AWS region as the target Redshift data warehouse.
  • The source must run on Aurora MySQL version 3.05 or the compatible version for Aurora PostgreSQL (PostgreSQL 15.4 version).
  • System tables, temporary tables, and views are not replicated to Redshift.
  • Zero-ETL is supported only with the InnoDB storage engine for Aurora MySQL.
  • Aurora PostgreSQL Zero-ETL is available only in the US East (N. Virginia) and US East (Ohio) regions.

For Redshift:

  • Zero-ETL integration is compatible with Redshift Serverless and RA3 node types for provisioned clusters.
  • The data warehouse must be encrypted.
  • Case sensitivity must be enabled.
  • The destination database is read-only, prohibiting the creation of new tables, views, or materialized views within it.
  • Classic resize is not supported for provisioned clusters with Zero-ETL integrations.

For both RDS and Redshift, there are additional detailed restrictions that may affect integration. For comprehensive guidance on these restrictions, you can refer the AWS documentation available here.

Demonstration

Overview

This technical demonstration outlines the process for establishing a Zero ETL (Extract, Transform, Load) integration between Amazon Aurora MySQL and Redshift Serverless.

ZeroETL: Core Components

The process involves several key steps:

  1. Creating a custom parameter group for the Aurora RDS (Relational Database Service).
  2. Launching an Aurora MySQL instance, specifically version 3.05 based on custom parameter group.
  3. Setting up a Redshift Serverless data warehouse.
  4. Implementing the Zero ETL integration

Each step requires detailed configuration, which will be covered during the demonstration. Although it’s feasible to set up this integration across different AWS accounts, for simplicity, this demonstration will use the same Virtual Private Cloud (VPC) and AWS account for both the source (Aurora MySQL) and target (Redshift Serverless) systems.

Step 1: Creating Custom DB Parameter Group

The initial step in setting up Zero ETL integration between Amazon Aurora and Redshift involves creating a custom DB cluster parameter group specific to Aurora MySQL 8.0.

This is essential because Aurora Zero ETL Integrations with Redshift require particular parameter values that control the replication feature.

The necessary parameters include —

  • aurora_enhanced_binlog=1,
  • binlog_backup=0,
  • binlog_format=ROW,
  • binlog_replication_globaldb=0,
  • binlog_row_image=full,
  • binlog_ row_metadata=full

Through the AWS console, the user with access to RDS APIs can access and configure these parameters within the parameter groups section.

zeroetl demo : custom parameter group

A custom parameter group named pg-aurora-mysql-zeroetl has been created for the DB cluster, specifically chosen for the Aurora MySQL 8.0 family. The values for these parameters have been set as discussed above.

Step 2: Creating Source : Aurora MySQL

In this step of the Zero ETL integration process, a source DB cluster, specifically an Aurora MySQL database, is created through the RDS console.

On the AWS Console, the option to “create a new database” is selected, choosing “Aurora MySQL compatible” engine. Please note that you are picking engine version as 3.05 only.

zeroetl demo : creating source database (Aurora MySQL)

Further configuration includes

  • The database is set up in a dev/test environment, with a designated name, default credential settings, and automatic password generation.
  • Cluster storage is configured with “Aurora standard” and “Burstable classes” for the instance configuration.
  • Default connectivity settings are selected, including the default VPC, subnet group, and public access enabled for remote connection.
  • A specific security group with inbound rule for port 3306 is chosen.
  • The default Aurora MySQL 8.0 parameter group is replaced with a custom parameter group we created for Zero ETL integration.

That’s all you need. Click on “Create Database” to create the source database for Zero-ETL integration.

Step 3: Creating Target : Redshift Serverless

To set up a Redshift Serverless workgroup for Zero ETL integration, the process starts with logging into the Redshift console with the right credentials.

A workgroup named target-redshift-serverless-workgroup is created, choosing the lowest capacity option (8) to optimize costs. The setup is within the same Virtual Private Cloud (VPC) as the RDS, to ensure connectivity and streamlined operations.

The default database and admin user credentials are accepted without customization for simplicity.

zeroetl demo : creating Redshift serverless

Following creation, it’s necessary to enable case sensitivity in the Redshift data warehouse, a requirement for Zero-ETL integration. By default, Redshift Serverless workgroups have case sensitivity disabled, which is then enabled using an AWS CLI command.

zeroetl demo : AWS CLI Command to update enable_case_sensitive_identifier

Additionally, a resource policy needs to be created to restrict specific integration sources and principles, ensuring secure and authorized connections.

zeroetl demo : configuring redshift resource policy

This involves adding an “Authorized integration sources”, in this case, the Aurora MySQL database created earlier, and specifying authorized principals to limit integration to designated users or roles.

Step 4: Creating Zero-ETL Integration

In this segment of the demonstration, the process of setting up a Zero ETL integration is showcased, highlighting the need to initiate the integration from the source side, as the Redshift console does not offer an option to create it directly.

The process begins in the RDS console, where a MySQL database has been prepared for integration. Once you provide the identifier, you can configure the source database. The AWS console automatically picks the aurora mysql database we created in Step 2.

zeroetl demo : creating integration

ZeroETL Integration: Review Page

The setup continues by selecting a target (Redshift Serverless), with the option to use the same or a different AWS account; in this case, the same account is used.

The integration is finalized without customizing encryption settings, relying on defaults.

The integration, now active, requires a new database to be created within the Redshift namespace to function properly. A database named “db1” is created, mirroring the name used in the Aurora MySQL setup (names could be different though).

ZeroETL Integration: Database not mapped

The Redshift query editor is accessed with Federated user credentials to verify the newly created database and its schemas, confirming the active state of the Zero ETL integration and its readiness for data transfer.

zeroetl demo : query editor showcasing db1 database is created on Redshift

Validating Zero-ETL Integration

Now that the Zero-ETL integration is in place, let’s validate it.

Initially, the target Redshift database shows no existing tables. Using DBeaver, a SQL editor on my local machine, a connection to the Aurora MySQL database named DB1 is established, confirming the absence of tables.

A sample table named “VENUE” is then created within DB1, successfully adding structure but no data.

zeroetl demo : sample Table (VENUE) created on RDS

Verification on the Redshift side shows the table has been replicated, indicating successful integration but still no records.

zeroetl demo : sample table (VENUE) replicated on Redshift

Subsequently, about 100 rows of sample data are inserted into the “venue” table in the Aurora MySQL database.

zeroetl demo : 100 records inserted inside VENUE Table on RDS

A query on Redshift confirms these records are replicated almost instantaneously, demonstrating the dynamic and real-time capabilities of the Zero ETL integration.

zeroetl demo : 100 records replicated on Redshift

This process verifies that updates or inserts in the source database are automatically and accurately reflected in the target Redshift database, confirming the operational success of the Zero ETL integration for seamless data synchronization and replication.

For more detailed instructions on setting up Zero-ETL integration between Amazon Aurora MySQL and Amazon Redshift, you can explore the tutorial available on YouTube:

AWS
Redshift
Zero Etl
Analytics
Quicktechbytes
Recommended from ReadMedium