BigQuery DataFrames Generative AI Goldmines!

December 9, 2023

358

Generative AI in BigQuery DataFrames turns customer feedback into opportunities

To run a successful business, you must understand your customers’ needs and learn from their feedback. However, extracting actionable information from customer feedback is difficult. Examining and categorizing feedback can help you identify your customers’ product pain points, but it can become difficult and time-consuming as feedback grows.

Several new generative AI and ML capabilities in Google Cloud can help you build a scalable solution to this problem by allowing you to analyze unstructured customer feedback and identify top product issues.

This blog post shows how to build a solution to turn raw customer feedback into actionable intelligence.

Our solution segments and summarizes customer feedback narratives from a large dataset. The BigQuery Public Dataset of the CFPB Consumer Complaint Database will be used to demonstrate this solution. This dataset contains diverse, unstructured consumer financial product and service complaints.

The core Google Cloud capabilities we’ll use to build this solution are:

Text-bison foundation model: a large language model trained on massive text and code datasets. It can generate text, translate languages, write creative content, and answer any question. It’s in Vertex AI Generative AI.

Textembedding-gecko model: an NLP method that converts text into numerical vectors for machine learning algorithms, especially large ones. Vector representations capture word semantics and context. Generative AI on Vertex AI includes it.

The BigQuery ML K-means model clusters data for segmentation. K-means is unsupervised, so model training and evaluation don’t require labels or data splitting.

BigQuery DataFrames for ML and generative AI. BigQuery DataFrames, an open-source Python client, compiles popular Python APIs into scalable BigQuery SQL queries and API calls to simplify BigQuery and Google Cloud interactions.

Data scientists can deploy Python code as BigQuery programmable objects and integrate with data engineering pipelines, BigQuery ML, Vertex AI, LLM models, and Google Cloud services to move from data exploration to production with BigQuery DataFrames. Here are some ML use cases and supported ML capabilities.

Build a feedback segmentation and summarization solution

You can copy the notebook to follow along. Using BigQuery DataFrames to cluster and characterize complaints lets you run this solution in Colab using your Google Cloud project.

Data loading and preparation

You must import BigQuery DataFrames’ pandas library and set the Google Cloud project and location for the BigQuery session to use it.

To manipulate and transform this DataFrame, use bigframes.pandas as usual, but calculations will happen in BigQuery instead of your local environment. BigQuery DataFrames supports 400+ pandas functions. The list is in the documentation.

This solution isolates the DataFrame’s consumer_complaint_narrative column, which contains the original complaint as unstructured text, and drops rows with NULL values for that field using the dropna() panda.

Embed text

Before applying clustering models to unstructured text data, embeddings, or numerical vectors, must be created. Fortunately, BigQuery DataFrames can create these embeddings using the text-embedding-gecko model, PaLM2TextEmbeddingGenerator.

This model is imported and used to create embeddings for each row of the DataFrame, creating a new DataFrame with the embedding and unstructured text.

K-means training

You can train the k-means model with the 10,000 complaint text embeddings.

The unsupervised machine learning algorithm K-means clustering divides data points into a predefined number of clusters. By minimizing the distance between data points and their cluster centers and maximizing cluster separation, this algorithm clusters data points.

The bigframes.ml package creates k-means models. The following code imports the k-means model, trains it using embeddings with 10 clusters, and predicts the cluster for each complaint in the DataFrame.

LLM model prompt

Ten groups of complaints exist now. How do complaints in each cluster differ? A large language model (LLM) can explain these differences. This example compares complaints between two clusters using the LLM.

The LLM prompt must be prepared first. Take five complaints from clusters #1 and #2 and join them with a string of text asking the LLM to find the biggest difference.

LLM provided a clear and insightful assessment of how the two clusters differ. You could add insights and summaries for all cluster complaints to this solution.

3 COMMENTS

Software-Defined Vehicles: Automotive Advancements December 9, 2023 At 3:11 pm

[…] next advancements, a command center for gathering and arranging enormous amounts of data, utilizing AI to gain insights, and automating deliberate actions. By separating software from hardware, the […]

Log in to leave a comment
Big Query Omni Cross-cloud MVs (Materialized Views) December 15, 2023 At 1:21 pm

[…] often ask how to make Big Query cross-cloud analytics simple and affordable as more companies adopt multi-cloud data […]

Log in to leave a comment
Master Data Precision: Monte Carlo On Google Cloud Unleashed January 6, 2024 At 10:34 am

[…] query logs, and other BigQuery features help structure your data, as well as Looker […]

Log in to leave a comment

BigQuery DataFrames Generative AI Goldmines!

Generative AI in BigQuery DataFrames turns customer feedback into opportunities

OpenSHMEM 1.5 Implementation For Remote Memory Sharing

Embodied AI Robots And What is Embodied AI? & Its Importance

At CinemaCon 2025 Samsung Onyx new LED Cinema Standards

3 COMMENTS

LEAVE A REPLY Cancel reply

Recent Posts

OpenSearch Service AWS Gets Amazon Q Developer Support

IPv6 And IPv4 Dual Stack Now Available In Amazon API Gateway

OpenSHMEM 1.5 Implementation For Remote Memory Sharing

OCI Compute Shapes Unleash Cloud Efficiency with AMD EPYC

Embodied AI Robots And What is Embodied AI? & Its Importance

At CinemaCon 2025 Samsung Onyx new LED Cinema Standards

Popular Post

ASRock’s creative AMD FP6 series thin mini-ITX motherboard

ASUS ProArt PA602 The Most Elegant Computer Case!

Boost Your Apps Now: Amazon ElastiCache Serverless Unveiled!

What is Azure Policy in Microsoft Azure

The Ultimate Showdown: Redmi Watch 3 vs Redmi Watch 4!

Cardea Z540 SSD Revolutionizes Storage

About Us

POPULAR CATEGORY