A Gemini feature in BigQuery, the BigQuery Studio data canvas provides a graphical interface for analysis processes and natural language prompts for finding, transforming, querying, and visualising data.
A directed acyclic graph (DAG) is used by BigQuery data canvas for analysis workflows, giving you a graphical representation of your workflow. Working with many branches of inquiry in one location and iterating on query results are both possible with BigQuery data canvas.
BigQuery data canvas
The BigQuery data canvas is intended to support you on your path from data to insights. Working with data doesn’t require technical expertise of particular products or technologies. Using natural language, BigQuery data canvas and Dataplex metadata combine to find relevant tables.
Gemini in BigQuery is used by BigQuery data canvas to locate your data, build charts, create SQL, and create data summaries.
Capabilities
BigQuery data canvas lets you do the following:
- Use keyword search syntax along with Dataplex metadata to find assets such as tables, views, or materialized views.
- Use natural language for basic SQL queries such as the following:
- Queries that contain
FROM
clauses, math functions, arrays, and structs. JOIN
operations for two tables.
- Queries that contain
- Visualize data by using the following types graphic types:
- Bar chart
- Heat map
- Line graph
- Pie chart
- Scatter chart
- Create custom visualizations by using natural language to describe what you want.
- Automate data insights.
Limitations
- Natural language commands might not work well with the following:
- BigQuery ML
- Apache Spark
- Object tables
- BigLake
INFORMATION_SCHEMA
views- JSON
- Nested and repeated fields
- Complex functions and data types such as
DATETIME
andTIMEZONE
- Data visualizations don’t work with geomap charts.
A ground-breaking data analytics tool, BigQuery data canvas, a Gemini in BigQuery feature, streamlines the whole data analysis process from data preparation and discovery to analysis, visualisation, and collaboration – all in one location, all within BigQuery. You may ask questions in both plain English and a variety of other languages about your data using the BigQuery data canvas, which makes use of natural language processing.
Because sophisticated SQL queries don’t need to be developed using this easy method, data analysis is now accessible to both technical and non-technical people. You may examine, modify, and display your BigQuery data using data canvas without ever leaving the environment in which it is stored.
This blog post provides a technical walkthrough of a real-world scenario utilising the public github_repos dataset, along with an overview of BigQuery data canvas. Over 3TB of activity from 3M+ open-source repositories are included in this dataset. We’ll look at how to respond to inquiries like:
- In a year, how many commits were made to a particular repository?
- In a particular year, who authored the most repositories?
- Over time, how many non-authored commits were applied?
- Which users, at what time, contributed to a certain file?
You’ll see how data canvas manages intricate SQL operations from your natural language prompts, such as joining tables, extracting particular data items, unnesting fields, and converting timestamps. We’ll even show you how to use just one click to create intelligent summaries and visualisations.
BigQuery data canvas quickly overview
BigQuery data canvas is mostly used for three types of tasks: finding data, generating SQL, and generating insights.
Find Data
To locate data in BigQuery using a rapid keyword search or a natural language text prompt, use data canvas.
Generate SQL
Additionally, you may use the BigQuery data canvas to have SQL code written for you using natural language prompts powered by Gemini.
Create Insights
At last, use a single click to uncover insights concealed within your data! Gemini creates visualisations for you automatically so you can see the story your data is telling.
Using the BigQuery data canvas
Let’s look at an example to help you better understand the potential impact that the BigQuery data canvas can have in your company. Businesses of all kinds, from big corporations to tiny startups, can gain from having a better grasp of the productivity of their development staff. Google Cloud will demonstrate in this in-depth technical tutorial how to leverage data canvas and the public dataset github_repos to provide insightful results in a shared workspace.
You’ll learn how data canvas simplifies the creation of sophisticated SQL queries by working through this example, which demonstrates how to create joins and unnested columns, convert timestamps, extract the month and year from date fields, and more. Gemini’s features make it simple to create these queries and use natural language to examine your data with illuminating visualisations.
Please be aware that using any LLM-enabled application successfully requires strong prompt engineering abilities, just like using many of the new AI products and services available today. Many people might believe that large language models (LLMs) aren’t very excellent at producing SQL right out of the box. However, in our experience, Gemini in BigQuery via data canvas may produce sophisticated SQL queries using the context of your data corpus if you use the appropriate prompting mechanisms. It is evident that data canvas uses natural language queries to decide the ordering, grouping, sorting, record count limitation, and SQL structure.
The github_repos dataset, which is 3TB+ in size and can be found in Bigquery Public Datasets, comprises information in numerous tables regarding commits, watch counts, and other activity on 3M+ open-source projects. We want to look at the Google Cloud Platform repository for this example. As always, before you begin, make sure you have the necessary IAM permissions. In addition, make sure you have the necessary rights to access the datasets and data canvas in order to run nodes properly.
Using data canvas makes it simple to explore every table in the github_repos dataset. Here, Google Cloud evaluate schema, details, and preview data in one panel while comparing datasets side by side.
After choosing your dataset, you can hover over the bottom of the node to branch it to query or join it with another table. The dataset for the following transformation node is shown by arrows. For clarity, you can give each node a name when sharing the canvas. You can delete, debug, duplicate, or run all of the nodes in a series using the options in the upper right corner. Results can be downloaded, and data can be exported to Looker Studio or Sheets. In the navigation panel, you can also inspect the DAG structure, restore previous versions, and rate SQL suggestions.
Google examine four main facets of their data while examining the github_repos dataset. They will attempt to ascertain the following:
1) The total number of commitments made in a single year
2) The quantity of written repos for a specific year
3) The total number of non-authored commits that were applied throughout time
4) Determine how many user commits there have been for a specific file at a specific time.
Utilise BigQuery data canvas to simplify data analysis
It might be challenging to interpret data for a new project or use case when working with large datasets that span multiple disciplines. This procedure can be streamlined by using data canvas. Data canvas helps you work more efficiently and quickly by streamlining data analysis using natural language-based SQL creation and visualisations. It also reduces the need for repetitive queries and lets you plan automatic data refreshes.