Friday, December 20, 2024

Amazon SageMaker Lakehouse: One Data, Vast AI/ML Ability

- Advertisement -

Amazon Athena federated queries now support SageMaker Lakehouse access controls

AWS unveiled Amazon SageMaker‘s next iteration, a single platform for data, analytics, and AI that combines popular AWS analytics and machine learning features. Its central component is SageMaker UniFed Studio, a unified environment for developing AI and data for big data processing, rapid SQL analytics, model building and training, generative AI application development, and data exploration, preparation, and integration.

Among the features included in this release is Amazon SageMaker Lakehouse, which allows you to create robust analytics and artificial intelligence and machine learning (AI/ML) applications using a single copy of data by integrating data from data lakes and data warehouses.

- Advertisement -

Along with these releases, Amazon SageMaker Lakehouse now has data catalogue and permissions features that let you connect, find, and centrally manage permissions to data sources.

In order to optimise for particular use cases and scalability needs, organisations store data across several systems. Data silos across databases, streaming services, data lakes, and data warehouses are frequently the outcome of this. Connecting to and analysing data from these various sources presents difficulties for analysts and data scientists. They have to maintain several access permissions, set up specific connectors for every data source, and frequently copy data, which raises expenses and may result in inconsistent data.

By making it easier to connect to well-known data sources, catalogue them, grant rights, and make the data accessible for analysis using SageMaker Lakehouse and Amazon Athena, the new feature tackles these issues. No matter where your data sources are located, you may use the AWS Glue Data Catalogue as a single metadata store. A centralised view of all the data that is available is thus provided.

You don’t need to establish connections again because data source connections are made once and can be utilised again. Databases and tables are automatically catalogued and registered with AWS Lake Formation as you connect to the data sources. After cataloguing, you give data analysts access to those databases and tables, saving them from having to connect to each data source separately and from needing to know the built-in secrets of those sources.

- Advertisement -

When querying with Athena, Lake Formation permissions can apply fine-grained access control (FGAC) limitations across warehouses, OLTP data sources, and data lakes. Time and money are saved on transfers and duplications when data remains in its original location. You may configure built-in connectors to Amazon S3, Redshift, Aurora, DynamoDB, Google BigQuery, and more with Data Catalogue. You can also develop or reuse data source connections.

Available now

When connecting to a unified catalogue and permissions with Data Catalogue across multiple data sources, Amazon SageMaker Lakehouse‘s data catalogue and permissions make interactive analytics easier through federated query. For a fast query experience, it defines and enforces fine-grained security policies across data lakes, data warehouses, and OLTP data sources in one place.

The US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), and Asia Pacific (Tokyo) AWS Regions offer this capability.

AWS SageMaker Lakehouse and Redshift offer zero-ETL application connectors

AWS declaredthat Amazon Redshift and Amazon SageMaker Lakehouse support for zero-ETL application connectors is now generally available. By combining all of your data from Amazon Redshift data warehouses and Amazon Simple Storage Service (Amazon S3) data lakes, Amazon SageMaker Lakehouse enables you to create robust analytics and AI/ML applications using a single copy of data. With SageMaker Lakehouse, you may use any tool or engine that is compatible with Apache Iceberg to access and query your data in-place. For common ingestion and replication use scenarios, AWS Zero-ETL suite of fully managed connectors reduces the need to create ETL data pipelines.

You can spend less time creating data pipelines and more time conducting unified analytics on all of your data in Amazon SageMaker Lakehouse and Amazon Redshift with zero-ETL connectors from apps like Salesforce, SAP, and Zendesk.

As businesses use more digital technology, data fragmentation becomes a significant problem. Valuable data is stored in numerous databases, applications, and other platforms. Businesses must allow access and consolidation from these many sources if they want to fully utilise their data. Users create data pipelines to extract and load (EL) data from various applications into centralised data lakes and data warehouses in order to address this difficulty. You can avoid weeks of engineering work required to design, build, and test data pipelines by using zero-ETL to effectively replicate valuable data from your enterprise resource planning (ERP), customer support, and relationship management (CRM) applications for analytics and AI/ML to datalakes and data warehouses.

Available now

In the US East AWS Regions, Amazon SageMaker Lakehouse, and Amazon Redshift now support zero-ETL integrations from applications in Hong Kong, Singapore, Sydney, Tokyo, Frankfurt, Ireland, and Stockholm.

New Amazon SageMaker Lakehouse simplifies AI/ML and analytics

Amazon SageMaker Lakehouse, which integrates Amazon Redshift and Amazon S3 data, is now accessible. This lets you build powerful analytics and AI/ML applications with a single data copy. The next version of Amazon SageMaker, a unified platform for data, analytics, and artificial intelligence, will include SageMaker Lakehouse, which integrates popular AWS machine learning and analytics technologies.

Consumers want to use data for more purposes. They are choosing the best databases and storage to hold their data in order to accelerate their analytics journey. Data silos are difficult to access and use since it is spread across apps, data lakes, and data warehouses. Fragmentation causes duplicate data copies and complex data pipelines, increasing the company’s costs. Customers’ options are also limited by the manner and location of data storage, which forces them to use particular query engines and tools. They are unable to work with the data as they would like because of this restriction. Finally, clients find it difficult to make well-informed business decisions due to the irregular data access.

By assisting you in integrating data from Amazon Redshift data warehouses and Amazon S3 data lakes, SageMaker Lakehouse tackles these issues. It gives you the freedom to use any engine or tool that is compatible with Apache Iceberg to access and query data in-place. Data sharing and collaboration are made easier with SageMaker Lakehouse’s ability to centrally set and enforce fine-grained permissions across various AWS services. Data import into SageMaker Lakehouse is easy. In addition to your data lakes and warehouses, you can leverage zero-ETL from operational databases like Amazon Aurora, Amazon RDS for MySQL, and Amazon DynamoDB, as well as apps like Salesforce and SAP. SageMaker Lakehouse blends nicely with your current settings.

Available now

SageMaker Lakehouse can be accessed via AWS CLI, SDKs, Management Console, or APIs. Also available through AWS Lake Formation and the Glue Data Catalogue. In Canada (Central), Ireland, Frankfurt, Stockholm, London, Sydney, Hong Kong, Tokyo, Singapore, and South America (Sao Paulo), SageMaker Lakehouse is available.

New Amazon DynamoDB-SageMaker Lakehouse zero-ETL integration

More than a million clients have turned to Amazon DynamoDB, a serverless NoSQL database, to create high-scale, low-latency applications. Organisations are continuously looking for methods to glean insightful information from operational data, which is frequently stored in DynamoDB, as data volumes increase. However, clients frequently create proprietary data pipelines a time-consuming infrastructure activity that provides little distinctive value to their core business in order to use this data in Amazon DynamoDB for analytics and machine learning (ML) use cases.

With only a few clicks and no use of your DynamoDB table capacity, you can now perform analytics and machine learning workloads using Amazon DynamoDB zero-ETL connection with Amazon SageMaker Lakehouse. Building strong analytics and AI/ML applications on a single copy of data is made easier with Amazon SageMaker Lakehouse, which unifies all of your data across Amazon S3 data lakes and Amazon Redshift data warehouses.

A group of integrations known as Zero-ETL reduces or does away with the necessity of creating ETL data pipelines. Users who run analytics and machine learning workloads on operational data in Amazon DynamoDB without affecting production workflows would benefit from this zero-ETL interface, which lowers the complexity of engineering efforts needed to create and manage data pipelines.

Now available

The following AWS Regions offer this new zero-ETL capability: Europe (Frankfurt, Ireland, Stockholm), Asia Pacific (Hong Kong, Singapore, Sydney, Tokyo), US East (N. Virginia, Ohio), and US West (Oregon).

- Advertisement -
Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes