Monday, March 24, 2025

AWS Lake Formation: Data Lake Security, Access Control On S3

What is AWS Lake Formation?

AWS Lake Formation enables the global sharing, protection, and central governance of data for analytics and machine learning. With Lake Formation, you can manage your data lake data on Amazon Simple Storage Service (Amazon S3) and its metadata in AWS Glue Data Catalogue with fine-grained access control.

In addition to the IAM permissions model, Lake Formation offers its own permissions model. The Lake Formation permissions model provides fine-grained access to data in data lakes and external data sources like Amazon Redshift data warehouses, Amazon DynamoDB databases, and third-party data sources using a simple grant or revoke mechanism, similar to an RDBMS. Amazon Athena, QuickSight, Redshift Spectrum, EMR, and Glue enforce Lake Formation permissions at the column, row, and cell levels with granular controls.

How does AWS Lake Formation Works?

Access to Data Catalogue resources, including databases, tables, and columns with underlying data in Amazon S3, can be granted or revoked using the relational database management system (RDBMS) permissions paradigm offered by AWS Lake Formation. The complicated Amazon S3 bucket policies and related Identity and Access Management (IAM) policies are replaced with the simple Lake Formation permissions.

Permissions can be applied on two levels in Lake Formation:

  • Enforcing rights at the metadata level for the Data Catalog’s databases and tables
  • Regulating storage access permissions for integrated engines’ underlying data stored in Amazon S3

Lake Formation permissions management workflow

In order to query Amazon S3 data stores and metadata objects that are registered with Lake Formation, AWS Lake Formation integrates with analytical engines. The permissions management process in Lake Formation is depicted in the following diagram.

Lake Formation permissions management workflow
Image Credit To AWS

Lake Formation permissions management high-level steps

A data lake administrator or user with administrative permissions must first create specific user rules for each Data Catalogue table to grant or prohibit access to Data Catalogue tables using AWS Lake Formation permissions before Lake Formation may offer access controls for the data in your data lake.

After that, the table’s Amazon S3 location is registered with Lake Formation, and users on the Data Catalogue databases and tables are granted Lake Formation access by the administrator or a user assigned by the administrator.

  • Obtain metadata An integrated analytical engine, such Amazon Athena, AWS Glue, Amazon EMR, or Amazon Redshift Spectrum, receives an ETL script or query from a principal (user). The Data Catalogue receives a request for metadata once the integrated analytical engine determines which table is being sought.
  • Verify permissions: After verifying that the user has the necessary permissions with Lake Formation, the Data Catalogue returns the metadata that the user is permitted to view to the engine.
  • Obtain credentials: The engine is informed by the Data Catalogue as to whether or not Lake Formation is in charge of the table. The analytical engine asks Lake Formation to allow temporary access to the data if the underlying data is registered with Lake Formation.
  • Obtain data: Lake Formation offers short-term access to the integrated analytical engine to those who are permitted to view the table. The analytical engine retrieves the data from Amazon S3 using the temporary access and applies any necessary filtering, such as cell, row, or column filtering. The engine provides the user with the results after completing the task. Credential vending is the term for this procedure.

The second request from the analytical engine is sent straight to Amazon S3 if Lake Formation is not in charge of the table. Data access is assessed for the relevant IAM user policy and Amazon S3 bucket policy.

Components of AWS Lake Formation

For the creation and management of your data lake, AWS Lake Formation depends on the interplay of multiple elements.

Lake Formation console

You may grant and revoke AWS Lake Formation permissions as well as define and manage your data lake using the Lake Formation panel. Data can be found, cleaned, transformed, and ingested using blueprints on the console. Individual Lake Formation users can also have their console access enabled or disabled.

Lake Formation API and Command Line Interface

Using the AWS Command Line Interface (AWS CLI) and a number of language-specific SDKs, Lake Formation offers API functions. The AWS Glue API is compatible with the Lake Formation API. While the AWS Glue API offers a data catalogue API and a managed infrastructure for designing, scheduling, and carrying out ETL operations on your data, the Lake Formation API is primarily concerned with maintaining Lake Formation permissions.

Other AWS services

The following services are utilised by Lake Formation:

  • AWS Glue to coordinate tasks and crawlers to apply AWS Glue transforms to data.
  • Principals of Lake Formation will receive permissions policies from IAM. To secure your data lake, the IAM permission model is enhanced by the Lake Formation permission model.
Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes