Google Generative Model Airflow Operators
Data analytics is evolving rapidly due to generative AI. Generative models create meaningful data content, revolutionising data-driven decision-making. Strong generative models like Gemini are available on Google Cloud’s Vertex AI, a unified AI development platform, and they drive this innovation.
Integrating Vertex AI’s generative models into data pipelines handled by Google Cloud’s fully managed workflow orchestration solution, Cloud Composer, and Airflow Apache has never been simpler. Three brand-new Airflow operators have been introduced to the apache-airflow-providers-google package in version 10.21.0 to enable interaction with the generative models of Vertex AI.
- TextGenerationModelPredictOperator
- TextEmbeddingModelGetEmbeddingsOperator
- GenerativeModelGenerateContentOperator
After discussing a few use examples, let’s demonstrate how to apply the new operators.
Data pipelines powered by generative AI
With this integration, data analytics pipelines may now do new and exciting things, like but not restricted to:
- Automated insights: Save data analysts time and resources by using raw data to create summaries, reports, and other insightful information.
- Data enrichment: Using generative models, augment datasets with artificial data to broaden the scope of your study and enhance downstream applications.
- Advanced anomaly detection: To improve your anomaly detection systems, use generative models to find odd patterns and outliers in your data.
- Text embeddings: Convert a sizable corpus of unstructured text into a structured format so that it may be compared, analysed, and gleaned from with objectivity.
- Content generation: Supply DAG metadata, including doc values, tags, and descriptions. Or use contextual pipeline awareness to personalise emails, alerts, and other messages.
- Translation: Gemini supports over 35 languages for the translation of text, files, and other stuff.
Using the new Airflow operators
To generate a prediction via a language model you can use
TextGenerationModelPredictOperator
Example:
predict_task = TextGenerationModelPredictOperator(
task_id=”predict_task”,
project_id=”your-project”,
location=”your-location”,
prompt=”Explain the difference between a text generation model and a generative model.”,
pretrained_model=”text-bison”,
)
To generate text embeddings you can use TextEmbeddingModelGetEmbeddingsOperator.
Example:
generate_embeddings_task = TextEmbeddingModelGetEmbeddingsOperator(
task_id=”generate_embeddings_task”,
project_id=”your-project”,
location=”your-location”,
prompt=”What are the benefits of generating text embeddings?”,
pretrained_model=”textembedding-gecko”,
)
To generate content with a generative model you can
use GenerativeModelGenerateContentOperator.
Example:
generate_content_task = GenerativeModelGenerateContentOperator(
task_id=”generate_content_task”,
project_id=”your-project”,
location=”your-location”,
contents=[
“Explain how to integrate Generative AI into an Airflow DAG in 25 words or less.”
],
pretrained_model=”gemini-1.5-pro”,
)
Each operator returns the model’s response in XCom under the `model_response` key.
Possible practical uses
Building on the aforementioned use cases, Vertex AI Generative Models, Airflow Apache, and Google Cloud can be combined to potentially solve the following problems:
Targeted marketing: Airflow can be utilised for targeted marketing purposes to plan and coordinate the optimisation of email campaigns. Every week or month, take client data out of Google Sheets and store it in Google Cloud Storage. To generate a variety of customised subject lines and content alternatives for every client category, use a Google Generative Model Airflow Operator to analyse the customer data.
Data cleansing: Utilising a staging space in Google Cloud Storage, create an Airflow DAG to handle batches of unprocessed client data. Utilise Google Generative Model Airflow Operators to standardise and validate addresses, fixing typos and, if feasible, adding missing details. After the data has been cleaned up, import it into BigQuery and mark any addresses that need manual review.
Identifying anomalies to optimise costs: To gather data on cloud resource utilisation from monitoring APIs on a daily or hourly basis, set up an Airflow DAG. To analyse the data and spot anomalous spikes in CPU utilisation, network traffic, or storage consumption, deploy a Google Generative Model that has been trained on previous usage trends and make reference to the model in your Google Generative Model Airflow Operators. Alert the infrastructure team so they may look into any serious anomalies and take appropriate action.
Representing visual content in new ways: Novel approaches to presenting visual material As soon as an image or video file is uploaded to Google Cloud Storage, create an Airflow DAG that starts. To create tabular data that describes these files, utilise Google Generative Model Airflow Operators’ multimodal capabilities (e.g., file metadata, time-series object detection, audio transcript data, frame analysis), Open BigQuery with the updated tabular data to obtain more insights.
Report coalescence: Read hundreds or thousands of connected PDF files using Google Generative Model Airflow Operators, then combine them into a summarised report. Minimise the requirement for internal approvals, reviews, and manual document writing. Results can be exported to Google Cloud Storage. Use the Rapid assessment API service to assess the results.
Automating customer service feedback: Export CCAI transcripts of customer service to Google Cloud Storage every day. Utilise Google Generative Model Airflow Operators to examine these transcripts and offer suggestions for how to enhance customer support. Results can be exported to BigQuery or sent to the customer support team via email every day.
Increasing Airflow DAG and task alerts: Provide the relevant DAG information and error messages to Google Generative Model Airflow Operators when a DAG fails. Utilise the operator’s reaction to include contextual understanding into a Cloud Logging log-based alerting approach.
Through the use of these potent three technologies, companies can create inventive solutions for a wide range of applications and areas.
Apache Airflow Tutorial
What is Apache Airflow?
The community developed Apache Airflow, a platform for programmatically authoring, scheduling, and tracking workflows.
Fundamentals
Adaptable
Using a message queue and a modular architecture, Airflow Apache can coordinate any number of workers. Airflow is prepared for infinite scaling.
Dynamic
Python is used to define Airflow Apache pipelines, enabling dynamic pipeline creation. This makes it possible to write code that dynamically instantiates pipelines.
Long-lasting
It’s simple to define your own operators and expand libraries to the appropriate abstraction level for your setting.
Elegant
Pipelines from Airflow Apache are explicit and lean. Its core incorporates parametrisation with the use of the potent Jinja templating engine.
Features
Only Python
No more XML magic or command-line manipulation! Create your workflows using basic Python features, such as loops for dynamic task generation and date/time formats for scheduling. This lets you create your workflows with complete flexibility.
Practical User Interface
Using a reliable and contemporary online application, you can schedule, monitor, and keep an eye on your workflows. Ignore the outdated, cron-like interfaces. You always have comprehensive access to the status and task logs, both completed and in progress.
Sturdy Integrations
Numerous plug-and-play operators are available from Airflow Apache, ready to carry out your duties on Google Cloud Platform, Amazon Web Services, Microsoft Azure, and numerous other third-party services. Because of this, it is simple to integrate Airflow with both existing infrastructure and emerging technologies.
Simple to Utilise
Anybody familiar with Python can implement a workflow. Your pipelines may be used for more than just building machine learning models. With Airflow Apache 174;, you can transmit data, manage your infrastructure, and much more.
Open-Source
You can open a PR to share your improvement wherever you’d like. That’s all there is to it; no obstacles, no drawn-out processes. Many active users of Airflow voluntarily share their experiences with others.