How high-performance gen AI apps are unlocked using Vertex AI’s vector search
What is Vector search?
For developers who must create apps that are blazingly fast, manage enormous datasets, and maintain cost-effectiveness even during major traffic surges, vector search is essential. However, developing and using this technology can be quite difficult, particularly for advanced AI applications that require extraordinary speed, size, and adaptability.
Vertex AI Vector Search Overview
Google Research’s vector search technology serves as the foundation for Vector Search. The same technology and research that underpin Google products like Google Play, YouTube, and Search are also available to you with Vector Search.
There are billions of semantically related or comparable items that Vector Search may search through. There are numerous applications for a vector similarity-matching service, including text classification, chatbots, recommendation engines, and search engines.
Vector search example
An online business with hundreds of thousands of apparel items in stock is one potential use for Vector Search. In this case, they may utilize Vector Search to match these objects to text queries to the most semantically similar images after creating embeddings of these items with the use of the multi-modal embedding API. For instance, users may type in “yellow summer dress” and Vector Search would come up with the results that were most similar. With high queries per second (QPS), excellent recall, low latency, and cost effectiveness, Vector Search can search at scale.
Embeddings are not just used with words or text. Semantic embeddings can be produced for a variety of data types, such as user preferences, audio, video, and photographs.
How to match semantics using Vector Search
There are only a few steps involved in semantic matching. Creating embedding representations of several items is the first step (done outside of Vector Search). Second, you link your data to Vector Search after uploading your embeddings to Google Cloud. You can make an index to run queries to obtain suggestions or results once you’ve contributed your embeddings to Vector Search.
Create an embedding
Create an embedding for the data you have. This entails preparing the data so that the approximate nearest neighbours (ANN) search is effective. Generative AI can be applied to Vertex AI to build an embedding, or it can be done independently of Vertex AI. Both text and multimodal embeddings can be made with Generative AI on Vertex AI.
Add your Cloud Storage embedding
To access your embedding from the Vector Search service, upload it to Cloud Storage.
Put into Vector Search
To do a nearest neighbour search, connect your embeddings to Vector Search. Your embedding is used to generate an index, which you can then deploy to an index endpoint for querying. The approximate nearest neighbours are returned by the query.
Examine the outcomes
You can assess the results to determine how well they satisfy your needs once you receive the approximate nearest neighbour results. You can change the algorithm’s settings or allow scalability to handle more queries per second if the results aren’t precise enough. You accomplish this by making changes to your configuration file, which sets up your index. See Configure index parameters for further information.
Terminology used in Vector Search
In order to use Vector Search, you must be familiar with the following key terms:
- Vector: A list of float values with magnitude and direction is called a vector. Any type of data, including numbers, locations in space, and directions, can be represented using it.
- Embedding: Data can be represented using an embedding, a sort of vector that captures the semantic meaning of the data. Natural language processing (NLP) and other machine learning applications frequently use embeddings, which are usually made using machine learning algorithms.
- Dense embeddings: These use arrays that primarily contain non-zero values to reflect the semantic meaning of text. Semantic similarity can be used to return similar search results with dense embeddings.
- Sparse embeddings: In contrast to dense embeddings, sparse embeddings use high-dimensional arrays with relatively few non-zero values to represent text syntax. Keyword searches frequently use sparse embeddings.
- Hybrid search: Using both sparse and dense embeddings, hybrid search enables you to do a combination of semantic and keyword searches. Dense embedding search is supported by Vector Search. Vector Search provides hybrid search and sparse embeddings as public preview features.
- Index: An index is a group of vectors used in tandem to search for similarities. It is possible to add or remove vectors from an index. The vectors in a certain index are searched by similarity search queries.
- Ground truth: Verifying machine learning’s correctness against the real world, such as a ground truth dataset, is known as “ground truth.”
- Recall: The proportion of true nearest neighbours among the nearest neighbours that the index returns. For instance, the recall is 19/20×100 = 95% if a nearest neighbour query for 20 nearest neighbours yielded 19 of the ground truth nearby neighbours.
- Restrict: A feature that uses Boolean rules to restrict searches to a subset of the index. “Filtering” is another term for restrict. Both text attribute filtering and numeric filtering are available with Vector Search.
What is the process of Vertex AI vector search?
Assume you run a well-known online business. Even during periods of high demand, your search engine must be able to quickly sort through millions of products and provide pertinent results to satisfy customers. One method for locating related things in large datasets is vector search. It functions by transforming data such as text or pictures into embeddings, which are numerical representations. More precise and pertinent search results are made possible by these embeddings, which capture the semantic meaning of the material.
It can be used for a variety of purposes, such as the e-commerce example mentioned above, or as a recommendation system that provides tailored suggestions based on user preferences or as a retrieval-augmented generation (RAG) system for generative AI agents, which bases responses in your data.
Impact of Vertex AI’s vector search in the real world
With Vertex AI vector search, clients are getting amazing results. These are four notable ways that this technology is assisting them in creating high-performing gen AI applications.
The quickest vector search for applications that require a lot of responsiveness
In search, recommendation, and gen AI applications, quick reaction times are essential to meeting user expectations. Faster reaction times are closely linked to higher revenue, conversion, and retention rates, according to numerous studies.
Vector search is designed for cost-effectiveness and exceptionally low latency at high quality. Google tested vector search on a dataset of one billion vectors and found that it could scale up to 5K queries per second while maintaining ultra-low latency (9.6ms at P95) and high recall (0.99). No matter how big the dataset or how many simultaneous queries the system receives, Vertex AI vector search guarantees that customers will always receive quick, pertinent results thanks to its extremely low latencies.
Extremely scalable for all sizes of applications
The capacity of your application to accommodate expanding user bases and data quantities is another crucial factor to take into account when creating production-ready applications.
As a result, it is highly scalable for applications of any size and can readily handle unexpected spikes in demand. With extremely low latency, Vertex AI vector search can scale to accommodate billions of embeddings and hundreds of thousands of requests per second.
Up to four times more economical
In addition to maintaining performance at scale, Vertex AI vector search is four times more affordable than rival options, particularly for high-performance applications. For quick and pertinent results at scale, you will require a lot less computation with Vertex AI vector search’s ANN index.
Dataset | QPS | Recall | Latency (P95) |
Glove 1M / 100 dim | 44,876 | 0.96 | 3ms |
OpenAI 5M / 1536 dim | 2,981 | 0.96 | 9ms |
Cohere 10M / 768 dim | 3,144 | 0.96 | 7ms |
LAION 100M / 768 dim | 2,997 | 0.96 | 9ms |
BigANN 10M / 128 dim | 33,921 | 0.97 | 3.5 ms |
BigANN 100M / 128 dim | 9,871 | 0.97 | 7.2 ms |
BigANN 1B / 128 dim | 4,967 | 0.99 | 9.6 ms |
It is very adaptable to all kinds of applications
Developers may choose to trade off latency for increased recall in some situations (or vice versa). For instance, a research database may prioritise thorough results even if they take a little longer, yet an e-commerce website may prioritise speed for prompt product recommendations. To meet business goals, vector search makes it possible to adjust these parameters and achieve higher recall or higher latency.
Vector search also has auto-scaling capabilities, which allow it to adjust to maintain performance as the deployment’s load increases. As QPS climbed from 1K to 5K, it measured auto-scaling and discovered that vector search was able to maintain consistent latency with high recall.
In order to manage larger throughput, developers can also choose alternative machine types to balance performance and cost, as well as increase the number of replicas. Because of its adaptability, vector search can be used for a variety of purposes outside of semantic search, such as chatbots, recommendation systems, multimodal search, anomaly detection, and image similarity matching.
Using hybrid search to go farther
Although dense embedding-based semantic search excels in comprehending context and meaning, it has a drawback in that it is unable to locate objects that the embedding model is unable to comprehend. Semantic search is unable to locate items such as product numbers, business codenames, or recently created phrases since the embedding model is unable to comprehend their meanings.
Developing this kind of advanced search engine is no longer a difficult undertaking with Vertex AI vector search’s hybrid search. Both dense and sparse embeddings, which stand for semantic meaning and keyword relevance, respectively, can be readily combined into a single index by developers. High-performance search apps that are completely tailored to match certain business requirements can be developed and implemented quickly thanks to this simplified process.
Vertex AI vector search pricing
The cost of Vector Search The components of the Approximate Nearest Neighbour service are:
- Pricing per node per hour for every virtual machine (VM) hosting a deployed index.
- A price for using streaming index updates, creating new indexes, and upgrading old ones.
Binary gigabytes (GiB) are used to measure the data processed during index building and updating; one GiB is equivalent to 1,073,741,824 bytes. Another name for this measuring unit is a gibibyte.
In every region, Vector Search charges $3.00 per gigabyte (GiB) of data handled. For Streaming Update inserts, Vector Search charges $0.45/GiB.