Saturday, July 27, 2024

Leveraging Gemini 1.0 Pro Vision in BigQuery

Gemini 1.0 Pro Vision in BigQuery

Organisations are producing more unstructured data in the form of documents, audio files, videos, and photographs as a result of the widespread use of digital devices and platforms, such as social media, mobile devices, and Internet of Things sensors. In order to assist you in interpreting and deriving valuable insights from unstructured data, Google Cloud has introduced BigQuery interfaces with Vertex AI over the last several months. These integrations make use of Gemini 1.0 Pro, PaLM, Vision AI, Speech AI, Doc AI, Natural Language AI, and more.

Although Vision AI can classify images and recognize objects, large language models (LLMs) open up new visual application cases. With Gemini 1.0 Pro Vision, they are extending BigQuery and Vertex AI integrations to provide multimodal generative AI use cases. You may use Gemini 1.0 Pro Vision directly in BigQuery to analyse photos and videos by mixing them with custom text prompts using well-known SQL queries.

Multimodal capabilities in a data warehouse context may improve your unstructured data processing for a range of use cases:

  • Object recognition: Respond to inquiries pertaining to the precise identification of items in pictures and movies.
  • Information retrieval: Integrate existing knowledge with data gleaned from pictures and videos.
  • Captioning and description: Provide varied degrees of depth in your descriptions of pictures and videos.
  • Understanding digital content: Provide answers by gathering data from web sites, infographics, charts, figures, and tables.
  • Structured content generation: Using the prompts supplied, create replies in HTML and JSON formats.

Converting unorganised information into an organised form

Gemini 1.0 Pro Vision may provide structured replies in easily consumable forms such as HTML or JSON, with just minor prompt alterations needed for subsequent jobs. Having structured data allows you to leverage the results of SQL operations in a data warehouse like BigQuery and integrate it with other structured datasets for further in-depth analysis.

Consider, for instance, that you have a large dataset that includes pictures of vehicles. Each graphic contains some fundamental information about the automobile that you should be aware of. Gemini 1.0 Pro Vision can be helpful in this use situation!

Combining text and image into a prompt for Gemini 1.0 Pro Vision, with a sample response
Image credit to Google cloud

As you can see, Gemini has answered with great detail! However, if you’re a data warehouse, the format and additional information aren’t as useful as they are for individuals. You may modify the prompt to instruct the model on how to produce a structured answer, saving unstructured data from becoming even more unstructured.

Adjusting the text portion of the prompt to indicate a structured response from Gemini 1.0 Pro Vision, with a sample result
Image credit to Google cloud

You can see how a BigQuery-like setting would make this answer much more helpful.

Let’s now examine how to ask Gemini 1.0 Pro Vision to do this analysis over hundreds of photos straight in BigQuery!

Gemini 1.0 Pro Vision Access via BigQuery ML

BigQuery and Gemini 1.0 Pro Vision are integrated via the ML.GENERATE_TEXT() method. You must build a remote model that reflects a hosted Vertex AI big language model in order to enable this feature in your BigQuery project. Thankfully, it’s just a few SQL lines:

After the model is built, you may produce text by combining your data with the ML.GENERATE_TEXT() method in your SQL queries.

A few observations on the syntax of the ML.GENERATE_TEXT() method when it points to a gemini-pro-vision model endpoint, as this example does:

TABLE: Accepts as input an object table including various unstructured object kinds (e.g. photos, movies).

PROMPT: Applies a single string text prompt to each object, row-by-row, in the object TABLE. This prompt is part of the option STRUCT, which is different from the situation when using the Gemini-Pro model.

To extract the data for the brand, model, and year into new columns for use later, they may add additional SQL to this query.

The answers have now been sorted into brand-new, organised columns.

There you have it, then. A set of unlabeled, raw photos has just been transformed by Google Cloud into structured data suitable for data warehouse analysis. Consider combining this new table with other pertinent business data. For instance, you might get the median or average selling price for comparable automobiles in a recent time frame using a dataset of past auto sales. These are just a few of the opportunities that arise when you include unstructured data into your data operations!

A few things to keep in mind before beginning to use Gemini 1.0 Pro Vision in BigQuery are as follows:

  • To do Gemini 1.0 Pro Vision model inference over an object table, you need an enterprise or enterprise plus reservation.
  • Vertex AI large language models (LLMs) and Cloud AI services are subject to limits; thus, it is important to evaluate the current quota for the Gemini 1.0 Pro Vision model.

Next actions

There are several advantages of integrating generative AI straight into BigQuery. You can now write a few lines of SQL to do the same tasks as creating data pipelines and bespoke Python code between BigQuery and the generative AI model APIs! BigQuery scales from one prompt to hundreds while handling infrastructure management.

Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes