LeanVec Improves Out-of-Distribution Vector Search Accuracy

April 9, 2025

172

Intel LeanVec Solves Vector Search Challenges with Smart Dimensionality Reduction

Vector search has emerged as a crucial element in a variety of applications that require extremely precise and prompt responses, as explained in the previous essay in this series. Vector search systems are subjected to memory and computation strain by high vector dimensionality, which frequently results in unsatisfactory performance. Cross-modal retrieval tasks, such as those in which a user enters a text query to identify the most pertinent photos for that query, have also grown in popularity.

Achieving high accuracy is difficult because these queries frequently have statistical distributions that differ from the database embeddings (i.e., the queries are out of distribution). LeanVec, introduced by Intel, speeds up vector search on large-dimensional vectors while maintaining accuracy in out-of-distribution queries by integrating dimensionality reduction and vector quantisation.

Introduction

The ability to generate high-dimensional embedding vectors whose spatial similarities mirror the semantic affinities of inputs (such as photos, music, video, text, genomics, and computer code) has improved significantly in recent years to deep learning models. By obtaining the closest neighbours to a particular query vector, this feature has made it possible for a variety of applications to search for semantically relevant results across enormous collections of vectors. Modern vector indices perform noticeably worse as dimensionality rises, notwithstanding the remarkable recent advancements in similarity search.

The most common are graph indices, which are composed of a directed graph with edges representing neighbor-relationships between vectors and each vertex representing a dataset vector. This allows the graph to be traversed effectively to locate the nearest neighbours in sub-linear time.

While graph-based indices are notable for their great performance and accuracy for modest dimensionalities (D = 100), they are less effective when dealing with the dimensionalities that deep learning models commonly output (D ≈ 512, 768, 1536). In a future where deep learning model-derived vectors are used in the majority of similarity search deployments, closing this performance gap is critical.

The system’s memory latency and bandwidth, which are mostly used to retrieve database vectors from memory in a random-like access pattern, are the main causes of this performance reduction in graph search. Although compressing the vectors seems like a good way to reduce memory load, current vector compression methods like PQ and SCANN are ineffective since they either don’t compress enough or perform badly because of erratic memory access patterns.

The Challenge with Out-of-Distribution Queries

When the statistical distributions of the database and the query vectors diverge, vector compression also becomes a more difficult problem (in this case, the queries are out-of-distribution, or OOD; when the distributions match, the queries are in-distribution, or ID). Regretfully, there are two notable instances of this happening regularly in contemporary applications. The first is cross-modal querying, which is when a user retrieves related elements from one modality using a query from another modality. For example, text2image retrieves thematically comparable images using word queries. Second, various models, including those used in question-answering applications, can generate queries and database vectors.

The significance of query-aware dimensionality reduction for maximum inner product search is demonstrated in the two-dimensional example that follows. Projecting the database (𝒳) and query (Q) vectors onto the first principal axis of 𝒳 (big green arrow) would be the best course of action for a query-agnostic approach (such as PCA, for instance). Since this route is opposite to the primary axis of Q (orange arrow), this decision will result in a low resolution of the inner products. Additionally, the direction that truly provides useful information (the second principal axis of 𝒳) is lost.

Using a query-aware technique, we can select a direction that maximally preserves the inner products. In this case the second principal direction of 𝒳 and the principal direction of Q coincide and provide the best choice. — Image Credit to Intel
*Using a query-aware technique, we can select a direction that maximally preserves the inner products. In this case the second principal direction of* 𝒳 *and the principal direction of Q coincide and provide the best choice.*

A Lightweight Technique for Dimensionality Reduction

LeanVec approximates the inner product between a database vector x and a query q as a way to speed up similarity search for deep learning embedding vectors.

Image credit to Intel

The way the projection works LVQ is a locally-adaptive vector quantisation technique that lowers the amount of bits per entry, while DRquery and DRDB decrease the vector dimensionality, or the number of entries in each vector. As seen in Figure, LeanVec down-projects the query and database vectors using the linear functions DRquery and DRDB.

Image credit to Intel
A schematic depiction of the LeanVec framework.

First, LeanVec creates two compressions from each database vector x:

First vector LVQ(DRDB(x)). A semi-accurate representation of is the inner-product approximation .
LVQ(x), a secondary vector. An accurate description of is the inner-product approximation.

The graph is constructed and searched using the primary vectors. Notably, Intel experimental findings demonstrate that the graph construction is resistant to both are dimensionality reduction and quantisation with LVQ. Only searches are conducted using the secondary vectors.

The graph index is searched during the search using the primary vectors. The time it takes to retrieve each vector from memory is shortened by the smaller memory footprint. Additionally, the algorithm’s computing effort is reduced due to its lower dimensionality, which necessitates fewer fused multiply-add operations. This approximation is perfect for the random memory-access pattern found in graph search since it allows for effective inner product computations with individual database vectors (no batch processing is needed).

By getting more candidates and re-ranking them using the secondary vectors to return the top-k, Intel make up for the faults in the inner-product approximation. There is a little runtime overhead because the dimensionality reduction for the query (i.e., computing f(q)) is only performed once per search.

Keep in mind that searches are a crucial step in the creation of graphs. As a result, the acceleration Intel were able to achieve in search translates directly into the acceleration of graph creation.

LeanVec uses new mathematical optimisation strategies to learn the projection functions DRquery and DRDB from data. Because these algorithms are computationally efficient, the number of dimensions rather than the number of vectors determines how long they take to execute. The statistical distributions of a small sample of typical query vectors and the database vectors are also taken into account by the methods.

Findings

The outcomes are self-evident. SVS performs better with LeanVec than it does without it, even surpassing the top open-source version of an algorithm that is generally regarded as the most performant (HNSWlib). The decrease in the per-query memory bandwidth consumption results in an almost 4-fold gain in query throughput at the same level of recall (95% 10 recall@10).

Image credit to Intel
Search throughput results in queries per second on HNSWlib, SVS without LeanVec, and SVS with LeanVec.

Conclusion

LeanVec speeds up similarity searches on high-dimensional vectors generated by contemporary embedding models by combining linear dimensionality reduction with vector quantisation. When queries are out of distribution, as is the case with text2image and question-answering systems, for instance, LeanVec performs exceptionally well.

LeanVec Improves Out-of-Distribution Vector Search Accuracy

Introduction

The Challenge with Out-of-Distribution Queries

A Lightweight Technique for Dimensionality Reduction

Findings

Conclusion

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

Bolttech Improves Customer Experience with AWS Generative AI

LEAVE A REPLY Cancel reply

Page Content

Recent Posts

AMD Radeon Pro W6600 Benchmark in CAD, Video Editing

Intel Core Ultra 5 225H Performance for Everyday Tasks

Intel Core i9 13900K Price, Benchmark, and Specifications

NVIDIA Tesla V100 Price, Features And Specifications

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

About Us

Tutorials