The Art of Creating BigQuery ML Feature Magic Mastery

January 16, 2024

160

Page Contents

Creating a reusable and modular BigQuery ML feature preparation

Feature engineering, a preprocessing phase in machine learning, is crucial for turning unstructured input into useful features. In this regard, BigQuery ML has advanced significantly, providing data scientists and ML developers with a flexible range of preprocessing tools for feature engineering These conversions can also be smoothly included into models, guaranteeing their transferability from BigQuery to serving environments such as Vertex AI. With BigQuery ML, They are now expanding on this by offering modularity, a special method of feature engineering. This facilitates direct transfer to Vertex AI and simple feature pipeline reuse inside BigQuery.

Preprocessing features using the TRANSFORM clause

A TRANSFORM statement may be included in the CREATE MODEL statement when building a model in BigQuery ML. This enables the use of preprocessing methods to provide custom requirements for transforming columns from the SELECT query into model characteristics. Because the statistics utilized for transformation are dependent on the data used to create the model, this is a huge benefit.

This offers uniformity of preprocessing comparable to other frameworks notably the Transform component of the TFX framework, which helps avoid training/serving skew. Even without a TRANSFORM command, automated transformations are conducted depending on the model type and data type.

Preprocessing procedures are used before input to impute missing values in the example below, which is taken from the accompanying lesson. Additionally, preprocessing is incorporated using the TRANSFORM command to scale the columns. This scaling is applied to the input data, which has already been imputed before being entered here, and it is integrated into the model. The model saves the computed scaling parameters to apply later when using the model for inference, which is a benefit of the embedded scaling functions.

Preprocessing that is reusable using the ML.TRANSFORM function
Direct access to the feature engineering portion of the model is possible with the new ML.TRANSFORM table function.
This makes it possible for several useful processes, such as
Utilize one model’s transformations to change another model’s inputs.

The ML.TRANSFORM function is applied to the input data immediately in the example below eliminating the need to compute the scaling parameters using the initial training data. This facilitates the effective repurposing of the modifications for subsequent models, further data scrutiny, and the detection of skew and drift in model monitoring computations.

Preprocessing in modules using TRANSFORM_ONLY models

Make transformation-only models to take reusability to the next level of modularity. This operates like other models by using CREATE MODEL with a TRANSFORM statement and using the variable model_type = TRANSFORM_ONLY. Put otherwise, it produces a model object that only consists of the feature engineering portion of the pipeline. That implies the transform model may be employed to alter inputs of any CREATE MODEL command as well, including registering the model to the Vertex AI Model Registry for usage in ML pipelines outside of BigQuery. For total mobility, the model may also be EXPORTED to GCS.

The TRANSFORM statement is assembled into a model using a standard CREATE MODEL statement. In this instance, all of the imputation processes are saved in one model object, which is capable of remembering the training data’s mean and median values and using them for imputation on subsequent records even during inference time.

Pipelines for features

It is feasible to employ many TRANSFORM_ONLY models in a feature pipeline due to their modularity. The feature pipeline is quite readable because to the BigQuery SQL Query syntax’s WITH clause (CTEs). This concept allows for the easy and flexible use of feature level transformation models, such as feature stores.

Create a TRANSFORM_ONLY model for each of the following features as an illustration of this concept: body_mass_g, culmen_length_mm, culmen_depth_mm, and flipper_length_mm. These are used in this instance to scale columns into features, much like the whole model to first build.

However, there are situations in which models must be employed for purposes other than data warehouse utilization, such as edge applications or internet forecasts. Take note of how the VERTEX_AI_MODEL_ID option was used to build the models above. This indicates that they are almost ready to be deployed to a Vertex AI Prediction Endpoint as they have already been automatically registered in the Vertex AI Model Registry. Additionally, for full mobility, these models, like all BigQuery ML models, may be exported to Cloud Storage using the EXPORT MODEL command.

In summary

Building and maintaining machine learning pipelines and power MLOps may be made simpler with the help of BigQuery ML’s new reusable and modular feature engineering. Modular preprocessing allows you to build transformation-only models that can be exported to Vertex AI or used as building blocks for other models. Feature pipelines in SQL are even made possible by this modularity directly. This may simplify maintenance, avoid training/seving skew, increase accuracy, and save you time.

The Art of Creating BigQuery ML Feature Magic Mastery

Creating a reusable and modular BigQuery ML feature preparation

Preprocessing features using the TRANSFORM clause

Preprocessing in modules using TRANSFORM_ONLY models

Pipelines for features

In summary

Probable Root Cause: Improving Instana’s Observability

Microwave 2T XMC-80D Wins iF Design Award 2024 & Red Dot

Hex-LLM: High-Efficiency LLM Serving to Vertex AI with TPUs

LEAVE A REPLY Cancel reply

Recent Posts

Probable Root Cause: Improving Instana’s Observability

Microwave 2T XMC-80D Wins iF Design Award 2024 & Red Dot

Hex-LLM: High-Efficiency LLM Serving to Vertex AI with TPUs

Toshiba & Quantonation Teams Up to Advance Quantum Science

Modern Art of Bahia Museum’s Unique Heritage Collection

Fitbit Sleep Data Links Health And Sleep In A Recent Study

Popular Post

ASRock’s creative AMD FP6 series thin mini-ITX motherboard

ASUS ProArt PA602 The Most Elegant Computer Case!

Cardea Z540 SSD Revolutionizes Storage

What is Azure Policy in Microsoft Azure

MSI Motherboards with Intel Application Optimization

Boost Your Apps Now: Amazon ElastiCache Serverless Unveiled!

About Us

POPULAR CATEGORY