AWS announced today that AWS Clean Rooms data collaborations now support Snowflake and Amazon Athena as new sources. With AWS Clean Rooms, you and your partners can analyze your combined datasets more easily and securely without having to share or duplicate each other’s underlying data. Without transferring or disclosing the underlying data, this feature enables you to work together on datasets that are stored in Snowflake or that can be queried using Athena features like the AWS Lake Formation permissions or AWS Glue Data Catalog views.
In order to obtain insights for marketing and advertising campaigns, investments, or research and development, you frequently need to work with partners to analyze datasets. The complexity, expense, compliance issues, and delays that come with transferring or copying data can be minimized or eliminated in situations where your partners’ datasets are handled or stored outside of Amazon Simple Storage Service (Amazon S3). Additionally, businesses discover that copying data may lead to the use of out-of-date information, which could lower the caliber of the insights obtained.
With zero extract, transform, and load (zero-ETL), this launch enables businesses to work together on the most recent pooled datasets in an AWS Clean Rooms collaboration. By doing this, the expense and difficulty of moving datasets from their current contexts are eliminated. For instance, without having to create ETL data pipelines or share underlying data, a media publisher with data stored in Snowflake and an advertiser with data stored in Amazon S3 can perform an audience overlap analysis to find the proportion of users in their combined datasets.
During the cooperation process, no underlying data from external data sources is kept in AWS Clean Rooms permanently, and any data that is momentarily read into the analysis environment is removed as the query is finished. This simplifies the process of producing insights by enabling you to collaborate with your partners regardless of where their data is kept.
How to use multiple clouds and data sources in AWS Clean Rooms
It use a scenario with an advertising (Company A) and a publisher (Company B) to illustrate this capability. Before launching an advertising campaign, Company A wants to discover how many of its high-value users can be reached on Company B’s website. Amazon S3 is where Company A keeps their data. Snowflake is where Company B keeps their data. Both parties need to have their own AWS accounts in order to use AWS Clean Rooms.
The advertiser, Company A, is the collaboration creator in this demo. Company B, whose data is kept in Snowflake, is invited to join the AWS Clean Rooms cooperation, which is created by Company A. The detailed instructions for setting up a cooperation may be found in the blog post announcing the general availability of AWS Clean Rooms.
AWS then demonstrate how publisher Company B sets up a table in AWS Clean Rooms, supplying the Secrets Manager Amazon Resource Name (ARN) and designating Snowflake as the data source. Throughout their lifecycles, secrets like database credentials may be managed, retrieved, and rotated with the aid of AWS Secrets Manager. The credentials of a Snowflake user who has read-only access to the data you wish to work with must be in your secret. It will be used by AWS Clean Rooms to access the data stored in Snowflake and read your secret. For more details on how to create your secret, refer to the Secrets Manager page.
It navigate to the AWS Clean Rooms console using Company B’s AWS account, then select Tables under Configured resources. Configure new table is my choice. Under third-party clouds and data sources, with Snowflake. It input the Secret ARN for the Snowflake credentials for a role that wish to work with that has read access to the Snowflake dataset. To confirm the identity of the entity attempting to access the Snowflake table and schema, you use these credentials. You can use the Store a new secret for this table option to generate a new secret if you don’t already have one.
AWS select the Columns View Information Schema CSV file that saved from Snowflake to fill in the data for me when utilize the Import from file option to define the table and schema parameters. The data can also be manually entered.
AWS navigate to the setup table and look at its properties, including the columns that can be queried and the AWS accounts that can create queries. It can change the table name, description, and analysis rule on this page.
AWS Clean Rooms have to set up an analysis rule in order to establish a table for collaboration analysis in AWS Clean Rooms. A privacy-enhancing control that each data owner implements on a specified table is called an analysis rule. How the configured table can be studied is determined by an analysis rule. To set up a custom analysis rule that permits the execution of custom queries on the defined table, select Configure Analysis Rule.
In Step 2, under Analyses for direct querying, select Allow any queries prepared by particular collaborators to run without review on this table. Only queries supplied by the AWS accounts designate in the list of permitted accounts may be executed on the table when this option is selected. Without needing to be reviewed, all analysis templates produced by the permitted accounts will be automatically permitted to be run on this table. Select Next after selecting the permitted account under your AWS account ID.
Continue with the selections in Step 3. To enable all columns to appear in the query output, select None under Columns not allowed in the output. No more analysis can be performed on this table because selected Not allowed under Additional analyses applied to the output. Next is your choice.
Choose the Configure analysis rule after reviewing the setup in the last step.
After that, link the table to the partnership that advertising Company A established using Associate.
Pick Choose cooperation from the list of collaborations with active memberships on the pop-up box.
Select the Configured table name on the following screen and type the name in the Table associations information field. You decide on a way to grant AWS Clean Rooms authorization to query the table. You go with the Associate table.
Without having access to one another’s raw data, Company A, the advertiser, and Company B, the publisher, can now do an audience overlap analysis to ascertain the proportion of users present in their combined datasets. The analysis aids in figuring out how much of the advertiser’s audience the publisher can reach. Without requiring either side to relocate or release their source data, advertisers can assess the overlap to see if the publisher offers unique reach or if the publisher’s audience largely overlaps with their current audience. It navigate to the AWS Clean Rooms console and switch to Company A’s account. To obtain the audience overlap analysis result, select the cooperation made and execute the following query:
SQL
select count (distinct emailaddress)
from customer_data_example as advertiser
inner join synthetic_customer_data as publisher
on 'emailaddress' = 'publisher_hashed_email_address'
It utilised Snowflake as a data source in this case. Using Athena and adhering to AWS Lake Formation permissions, you can also do queries on this data. This facilitates the use of AWS Glue Data Catalog views for data transformation prior to the datasets being linked to the collaboration, as well as row- and column-level filtering with Lake Formation fine-grained access control.