Unlock large-scale multimodal search: Use Vertex AI to combine text and picture power. Users’ search habits are changing. While looking for a product, people may utilise visuals or natural-sounding text. They expect customised, query-specific results in exchange. Developers want strong multimodal search systems to satisfy these requirements.
Google Cloud will discuss a potent method for creating a multimodal search engine with Google Cloud’s Vertex AI platform. Using an ensemble approach with weighted Rank-Biased Reciprocal Rank (RRF), Google Cloud will integrate the advantages of Google vector search with vertex AI search. This method enables:
- A better experience for users: Finding the “perfect” keywords is less important as searching becomes more natural.
- Improved product identification: With just words, users can find things they might not have otherwise.
- Increased rates of conversion: Increased revenue and happy customers are the outcomes of more relevant and interesting search results.
The importance of using a mixed approach
Consider the way you look for things on the internet. Let’s say you wish to look for terms like “white marble countertops” or “homes with a large backyard.” While some of this data may only be accessible through photos, others may be saved in text. You want the system to use both modalities when you search for a product.
Asking a large language model (LLM) to provide a written description of a picture might be one strategy. However, this might cause delay for your consumers and be difficult to control over time. Alternatively, Google Cloud may use Vertex AI Search to merge the search results with text data by utilising picture embeddings. When combined, this multimodal strategy yields:
- Richer visual comprehension: Multi-modal embeddings go beyond basic text annotations to capture the intricate visual relationships and elements seen in images.
- Image-based queries: By enabling users to search directly with an image, this feature enables more natural discovery based on visual cues.
- Precise filtering: This enables the search to be very exact and produces tailored results by filtering by specific parameters such as size, layout, materials, and features.
The Vertex AI platform from Google Cloud offers a full suite of resources for creating and implementing machine learning solutions, including robust search features:
- Vertex AI search: An extremely feature-rich and scalable search engine for a variety of searches. Advanced features including synonyms, faceting, filtering, and custom relevance ranking are supported. Advanced document parsing is also made possible by it, notably for unstructured documents (PDFs) and even those that have embedded visuals (such as tables, infographics, etc.).
- To create image embeddings, or numerical representations of pictures, utilise the Vertex AI multimodal embedding API.
- Vertex AI Google vector search: This serves as a vector database where embeddings with searchable metadata are kept. Both dense and sparse embeddings, such text descriptions and pictures, can be stored in it.
Google Cloud’s group strategy: Power of text and images
Google Cloud will employ an ensemble technique to develop Google Cloud’s multimodal search engine, combining the advantages of Google vector search for photos with Vertex AI Search:
Vertex AI Search for text search
- Use an agent builder to index the names, descriptions, and characteristics of your product catalogue data into a data store.
- Vertex AI Search uses semantic understanding, keyword matching, and any custom ranking rules you specify to return pertinent products when a user types in a text query.
- Additionally, this has the ability to return facets that may be used for filtering.
- You can even see how complicated or unstructured documents are chunked and analysed.
Using vector embeddings for image search
- Use the multimodal embeddings API to create picture embeddings for your goods.
- These embeddings should be kept in Google vector search.
- To locate visually related product pictures, turn user-uploaded text or photos to embeddings and query the vector database.
Using weighted RRF to combine results
- Rank-biased Reciprocal Rank (RRF): This metric takes into account the location of the first relevant item in a ranked list to determine how relevant the list is. Lists with more pertinent items display higher.
- Weighted RRF: Give the picture similarity score (from Google vector search) and the text relevance score (from Vertex AI Search) weights. This enables you to modify each modality’s weight in the final ranking (e.g., Vertex or Google vector search).
- Ensemble: Present the user with the blended list after combining the text and picture search results and reranking them based on the weighted RRF score.

Use the faceting features of Vertex AI Agent Builder Search to improve the search experience:
- Describe the facets: Make facets for categories, features (colour, size, material), price ranges, etc. based on your product data.
- By enabling users to dynamically customise their searches using these facets, dynamic filtering helps consumers focus on the most pertinent goods from the results. The term “dynamic” refers to the filters’ automatic adjustment based on the results that are returned.
- Understanding natural language queries: To enhance search results, you may activate natural language query understanding in your Vertex AI Agent Builder Search if the textual input is organised. After that, you may use namespaces to apply the same filters to the Google vector search by parsing the response’s filters.
Why this strategy is effective
This method combines the rich capabilities of Vertex AI Search (such the parsing pipeline) with the capability of using photos as a query directly, giving developers the best of both worlds. Because it modifies the weights in your RRF ensemble and customises some aspects to meet your unique requirements, it is also adaptable and customisable.
The ability to search with ease using text, photos, or both, together with dynamic filtering choices for more focused results, is what this method most importantly provides for your users.
Start using multi-modal search now
You may create a very efficient and interesting search experience for your consumers by utilising Vertex AI’s capability and integrating text and picture search with a strong ensemble approach. Begin by:
Examine Vertex AI
Examine the documentation to learn more about Vertex AI Search’s and embedding generation’s capabilities.
Try out different embeddings
Experiment with several picture embedding models and adjust them on your data as necessary.
Put weighted RRF into practice
To maximise your search results, create your scoring algorithm and test out various weights.
Understanding of natural language queries
Apply the same filters to Google vector search by utilising the built-in features of Vertex AI agent builder Search to create filters on structured data.
Vector search filters
Use filters on your image embeddings to give consumers even more flexibility.