With the now broadly available Amazon RDS for MySQL zero-ETL interface with Amazon Redshift, near real-time analytics are possible.
For comprehensive insights and the dismantling of data silos, zero-ETL integrations assist in integrating your data across applications and data sources. Petabytes of transactional data may be made accessible in Redshift Amazon in only a few seconds after being written into Amazon Relational Database Service (Amazon RDS) for MySQL thanks to their completely managed, no-code, almost real-time solution.
As a result, you may simplify data input, cut down on operational overhead, and perhaps even decrease your total data processing expenses by doing away with the requirement to develop your own ETL tasks. They revealed last year that Amazon DynamoDB, RDS for MySQL, and Aurora PostgreSQL-Compatible Edition were all available in preview as well as the general availability of zero-ETL connectivity with Redshift Amazon for Amazon Aurora MySQL-Compatible Edition.
With great pleasure, AWS announces the general availability of Amazon RDS for MySQL zero-ETL with Redshift Amazon. Additional new features in this edition include the option to setup zero-ETL integrations in your AWS Cloud Formation template, support for multiple integrations, and data filtering.
Data filtration
The majority of businesses, regardless of size, may gain from include filtering in their ETL tasks. Reducing data processing and storage expenses by choosing just the portion of data required for replication from production databases is a common use case. Eliminating personally identifiable information (PII) from the dataset of a report is an additional step. For instance, when duplicating data to create aggregate reports on recent patient instances, a healthcare firm may choose to exclude sensitive patient details.
In a similar vein, an online retailer would choose to provide its marketing division access to consumer buying trends while keeping all personally identifiable information private. On the other hand, there are other situations in which you would not want to employ filtering, as when providing data to fraud detection teams who need all of the data in almost real time in order to draw conclusions. These are just a few instances; We urge you to explore and find more use cases that might be relevant to your company.
Zero-ETL Integration
You may add filtering to your zero-ETL integrations in two different ways: either when you construct the integration from scratch, or when you alter an already-existing integration. In any case, this option may be found on the zero-ETL creation wizard’s “Source” stage.
Entering filter expressions in the format database.table allows you to apply filters that include or exclude databases or tables from the dataset. Multiple expressions may be added, and they will be evaluated left to right in sequence.
If you’re changing an existing integration, Redshift Amazon will remove tables that are no longer included in the filter and the new filtering rules will take effect once you confirm your modifications.
Since the procedures and ideas are fairly similar, we suggest reading this blog article if you want to dig further. It goes into great detail on how to set up data filters for Amazon Aurora zero-ETL integrations.
Amazon Redshift Data Warehouse
From a single database, create several zero-ETL integrations
Additionally, you can now set up connectors to up to five Redshift Amazon data warehouses from a single RDS for MySQL database. The only restriction is that you can’t add other integrations until the first one has successfully completed its setup.
This enables you to give other teams autonomy over their own data warehouses for their particular use cases while sharing transactional data with them. For instance, you may use this in combination with data filtering to distribute distinct data sets from the same Amazon RDS production database to development, staging, and production Redshift Amazon clusters.
One further intriguing use case for this would be the consolidation of Redshift Amazon clusters via zero-ETL replication to several warehouses. Additionally, you may exchange data, train tasks in Amazon SageMaker, examine your data, and power your dashboards using Amazon Redshift materialized views.
In summary
You may duplicate data for near real-time analytics with RDS for MySQL zero-ETL connectors with Redshift Amazon, eliminating the need to create and maintain intricate data pipelines. With the ability to implement filter expressions to include or exclude databases and tables from the duplicated data sets, it is already widely accessible. Additionally, you may now construct connections from many sources to combine data into a single data warehouse, or set up numerous connectors from the same source RDS for MySQL database to distinct Amazon Redshift warehouses.
In supported AWS Regions, this zero-ETL integration is available for Redshift Amazon Serverless, Redshift Amazon RA3 instance types, and RDS for MySQL versions 8.0.32 and later.
Not only can you set up a zero-ETL connection using the AWS Management Console, but you can also do it with the AWS Command Line Interface (AWS CLI) and an official AWS SDK for Python called boto3.