Cloud Composer 3: Next-gen Data Pipeline Orchestration

Any data team will tell you that maintaining Apache Airflow is frequently a difficult task that requires many hours of battling with security, dependability, and effective resource scalability. It would be more beneficial to use this precious time to generate insights and increase business value.

Google presents Cloud Composer 3, the most recent iteration of its fully managed Apache Airflow service, which is now generally available. Data teams can now expedite time-to-value, lower operational overhead, and streamline workflows with this release, which marks a major leap in data pipeline orchestration.

Cloud Composer: One of the top managed airflow solutions

Data teams can create, operate, and oversee data pipelines more effectively and controllably with the help of Cloud Composer 3. It is an excellent platform for speeding up data-driven projects because of its improved efficiency, comprehensive security features, and streamlined user interface. Many clients continue to choose Cloud Composer as their first choice for orchestrating data and analytics workflows. It is also frequently used to orchestrate AI/ML workloads, supporting MLOps operations that are essential to an organization’s AI endeavours.

What new Cloud Composer 3?

A number of strong features and enhancements are included in Cloud Composer 3 to streamline data pipeline administration and increase productivity:

  • Simplified networking: Reduce complexity and management overhead by configuring network settings with ease using streamlined options.
  • Evergreen versioning: Get access to new features, security patches, and performance enhancements by staying current with the most recent Cloud Composer versions.
  • Hidden infrastructure: Pay attention to your data pipelines rather than infrastructure administration. You can focus on creating and managing your DAGs while Cloud Composer 3 takes care of the supporting infrastructure.
  • Improved performance and dependability: Benefit from increased resource management and infrastructure optimisation for better performance and dependability.
  • Per-task CPU and memory control: Optimise performance and cost effectiveness by allocating resources for specific workloads.
  • Boost your security posture: Google uses industry best practices to proactively maintain the environment’s security posture thanks to Cloud Composer 3’s hidden architecture.
  • Plus a lot more.

Advantages for data teams

For data scientists, data engineers, and data architects, Cloud Composer 3 offers observable advantages:

  • Enhanced productivity: Data teams can devote more of their valuable time to key duties when workflows are streamlined and environment management is made simpler.
  • Reduced operational overhead: Automated infrastructure management and evergreen versioning reduce operational overhead, letting teams innovate.
  • Faster time-to-value: Improved performance and scalability speed up data pipeline execution, offering faster insights and shorter time to market for data-driven projects.

Convert settings to Cloud Composer 3

The process of moving DAGs, data, and configuration from your current Cloud Composer 2 environment to Cloud Composer 3 is described here.

Verify that Cloud Composer 3 is compatible with your DAGs

Use these tips to ensure that your DAGs are compatible with Cloud Composer 3:

  • Your Cloud Composer 3 environment may have a different package list than your Cloud Composer 2 environment. This could have an impact on how well your DAGs work with Cloud Composer 3.
  • Without altering or modifying them for compatibility, Cloud Composer loads environment variables, PyPI packages, and configuration overrides from the snapshot of your Cloud Composer 2 environment into Cloud Composer 3. You can avoid installing custom PyPI packages while loading the snapshot if they result in dependency issues.
  • The tenant project is where the environment’s cluster is situated in Cloud Composer 3. Verify that this modification is consistent with your DAGs. Specifically, KubernetesPodOperator workloads can no longer scale with your environment, and pod affinity configurations are no longer supported.
  • The Airflow database is not directly accessible in Cloud Composer 3. Verify that this modification is consistent with your DAGs.

In your Cloud Composer 2 environment, pause DAGs

Pause each DAG in your Cloud Composer 2 environment before saving its snapshot to prevent duplicate DAG runs. Avoid using the airflow_monitoring DAG for liveness monitoring; it is just used for monitoring and is not present in environment snapshots.

DAGs can be paused using any of the following methods:

  • In the Google Cloud console, pause each DAG individually:
    • Navigate to the Environments page in the Google Cloud console.
    • To see the details of an environment, select it.
    • Navigate to the DAGs tab on the Environment information page.
    • Click on a DAG’s name.
    • Click Pause DAG on the DAG information page.
  • Navigate to DAGs in the Airflow web interface, then manually halt each DAG.
  • To pause every DAG, use the composer_dags script:
python3 composer_dags.py --environment COMPOSER_2_ENV \
  --project PROJECT_ID \
  --location COMPOSER_2_LOCATION \
  --operation pause

Replace:

  • COMPOSER_2_ENV with the name of your Cloud Composer 2 environment.
  • PROJECT_ID with the Project ID.
  • COMPOSER_2_LOCATION with the region where the environment is located.

Save a snapshot of your Cloud Composer 2 environment

The console

  • Take a picture of your surroundings:
  • Navigate to the Environments page in the Google Cloud console.
  • Click the name of your Cloud Composer 2 environment in the list of environments. The page with the environment details opens.
  • Select “Create snapshot.”
  • Click Submit in the Create snapshot dialogue box. Although you can choose a different place, in this instruction you save the snapshot in the bucket of the Cloud Composer 2 environment. The service accounts of both environments must have read and write rights for the specified location if you select a custom location.
  • Await the creation of the snapshot by Cloud Composer.
  • The location of the snapshot is output in the message that appears once it has been created. When you load this snapshot into the Cloud Composer 3 environment, you can use this information later.
  1. For example, the snapshot location might look like: gs://us-central1-example-916807e1-bucket/snapshots/snapshots_example-project_us-central1_example-environment/2024-05-15T15-23-12.

Gcloud

  • Take a picture of your Cloud Composer 2 setup:
gcloud composer environments snapshots save \
  COMPOSER_2_ENV \
  --location COMPOSER_2_LOCATION

Replace

  • Enter the name of your Cloud Composer 2 environment, COMPOSER_2_ENV.
  • COMPOSER_2_LOCATION, which contains the location of the Cloud Composer 2 environment.
  • (optional) The –snapshot-location option allows you to designate a unique location for the environment’s snapshot to be kept.
  • Although you can choose a different place, in this instruction you save the snapshot in the bucket of the Cloud Composer 2 environment. The service accounts of both environments must have read and write rights for the specified location if you select a custom location.

The location of the snapshot is output in the message that appears once it has been created. When you load this snapshot into the Cloud Composer 3 environment, you can use this information later.

For example, the snapshot location might look like: gs://us-central1-example-916807e1-bucket/snapshots/snapshots_example-project_us-central1_example-environment/2024-05-15T15-23-12.

Set up an environment for Cloud Composer 3

Use the following instructions to set up a Cloud Composer 3 environment:

  • As with your Cloud Composer 2 environment, you can begin with the same resource limits setting and then scale and optimise it further.
  • The Airflow DAG processor operates as a distinct environment component in Cloud Composer 3 settings. You may want to reallocate resources that were previously assigned to Airflow schedulers because the DAG processor takes up the parsing of DAGs from the scheduler. After switching to Cloud Composer 3, you can monitor the performance of the scheduler and DAG processor.
  • Compared to Cloud Composer 2, Cloud Composer 3 offers a more efficient and straightforward networking configuration. VPC networks can be attached and detached, and networking configurations can be changed between public and private IP. IP ranges don’t need to be specified. Verify that the networking setup in your Cloud Composer 3 environment is identical to that in your Cloud Composer 2 environment.
  • Since you will be replacing the configuration overrides and environment variables when you load the snapshot of your Cloud Composer 2 environment, you do not need to supply them now.

Open your Cloud Composer 3 environment and load the snapshot

Console

The snapshot can be loaded into your Cloud Composer 3 environment by:

Navigate to the Environments page in the Google Cloud console.

Click the name of your Cloud Composer 3 environment in the list of environments. The page with the environment details opens.

Select “Load snapshot.”

Click Browse in the Load snapshot dialogue.

Choose the snapshot-containing folder.

This folder, whose name is the date of the snapshot save operation, is found in the /snapshots folder of your Cloud Composer 2 environment bucket if you choose the default placement for this tutorial. The notification that the snapshot had been successfully created showed the same location.

After selecting Load, watch as Cloud Composer loads the snapshot.

Gcloud

In your Cloud Composer 3 environment, load the snapshot of your Cloud Composer 2 environment:

gcloud composer environments snapshots load \
  COMPOSER_3_ENV \
  --location COMPOSER_3_LOCATION \
  --snapshot-path "SNAPSHOT_PATH"

Replace:

  • Enter the name of your Cloud Composer 3 environment, COMPOSER_3_ENV.
  • The Cloud Composer 3 environment’s location is indicated by COMPOSER_3_LOCATION.
  • SNAPSHOT_PATH with the path of the snapshot and the URI of the bucket in your Cloud Composer 2 environment. The notification that the snapshot had been successfully created showed the same location. For instance: gs://us-central1-example-916807e1-bucket/snapshots/example-project_us-central1_example-environment_2024-05-15T15-23-12.

In the Cloud Composer 3 environment, unpause DAGs

Any of the following choices are available to you:

  • In the Google Cloud console, unpause each DAG individually:
    • Navigate to the Environments page in the Google Cloud console.
    • To see the details of an environment, select it.
    • Navigate to the DAGs tab on the Environment information page.
    • Click on a DAG’s name.
    • Click Unpause DAG on the DAG information page.
  • Navigate to DAGs in the Airflow web interface, then manually unpause each DAG individually.
  • To unpause every DAG, use the composer_dags script:
  python3 composer_dags.py --environment COMPOSER_3_ENV \
  --project PROJECT_ID \
  --location COMPOSER_3_LOCATION \
  --operation unpause

Replace:

COMPOSER_3_LOCATION with the region where the environment is located.

COMPOSER_3_ENV with the name of your Cloud Composer 3 environment.

PROJECT_ID with the Project ID.

Check for DAG errors

  • Go to DAGs in the Airflow web interface and look for any reported syntax problems in DAGs.
  • Verify that the DAG runs are planned for the appropriate time.
  • Await the completion of the DAG runs in the Cloud Composer 3 environment, then verify their success. Don’t unpause a DAG run in the Cloud Composer 2 environment if it was successful; doing so will cause a DAG run to occur in your Cloud Composer 2 environment on the same time and date.
  • Try troubleshooting the DAG until it runs successfully in Cloud Composer 3 if a particular DAG fails.

Keep an eye on your Cloud Composer 3 setup

Once all DAGs and configuration have been moved to the Cloud Composer 3 environment, keep an eye out for any possible problems, unsuccessful DAG runs, and general environment health.

Think about removing the Cloud Composer 2 environment if the Cloud Composer 3 environment functions flawlessly for a long enough duration.

Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.
RELATED ARTICLES

Page Content

Recent Posts

Index