OpenCLIP Image Search
Constructing High-Performance Image Search Using Intel Max, Chroma, and OpenCLIP GPUs
A field guide for firms pursuing lean, high-throughput LLM pipelines was created after Intel Liftoff mentors and AI developers thoroughly examined the Intel Data Center GPU Max 1100 and Intel Tiber AI Cloud.
The Intel Tiber AI Cloud was used for all of the development, testing, and benchmarking activities detailed in this article.
A managed cloud platform called Intel Tiber AI Cloud was created especially to give developers and AI companies scalable and affordable access to Intel’s cutting-edge AI hardware line. This comprises the most recent Intel Xeon Scalable CPUs, Intel Data Center GPU Max Series, and Intel Gaudi 2 (and Gaudi 3) accelerators. Intel Tiber AI Cloud offers a performance-optimized environment and eliminates the substantial upfront hardware investment barrier for startups concentrating on developing and implementing compute-intensive AI models.
Announced to get in touch with the Intel Liftoff for AI Startups program if you are an AI startup interested in learning more about the potential of Intel Data Center GPU Max, Intel Gaudi accelerators, and utilizing the optimized environment of Intel Tiber AI Cloud for your own projects.
With the help of resources, technological know-how, and access to platforms like Intel Tiber AI Cloud.
Applications powered by AI are depending more and more on multimodal data, such as text, audio, and images. This article shows how to use Chroma and OpenCLIP embeddings to generate and query a multimodal database that contains both text and images.
Effective data comparison and retrieval across several modalities is made possible by these embeddings. Building a system that can handle image data and query it using text-based search queries while utilizing GPU or XPU acceleration for improved performance is the aim of the project.
Advanced AI is Powered by the Intel Data Center GPU Max 1100
Strong hardware, such as the Intel Data Center GPU Max Series, makes possible the performance detailed in this paper, especially the acceleration made possible by Intel Extension for PyTorch (IPEX). The GPU (Max 1100) can be found as dedicated instances and on the free Intel Tiber AI Cloud JupyterLab environment:
Xe-HPC Compute Architecture:
- Xe-cores: 56 specialized cores that serve as the basis for GPU computation operations.
- Engines for Intel Xe Matrix Extensions (XMX): Deep systolic arrays tailored for speeding up the dense matrix and vector operations common in AI and deep learning models are provided by 448 engines.
- For more extensive parallel processing workloads, the XMX units are complemented by 448 engines known as vector engines.
- Ray Tracing Units: 56 units that improve visualization capabilities through hardware-accelerated ray tracing.
The hierarchy of memory
- 48 GB of High Bandwidth Memory (HBM2e) provides 1.23 TB/s of bandwidth, which is essential for complicated models and big datasets, such as those utilized in multimodal embeddings.
- Cache: Utilizes a 28 MB L1 and 108 MB L2 cache to minimize latency by keeping data near to the computation units.
Connectivity
- PCIe Gen 5: Enables high-speed data transfer between the CPU and GPU by utilizing a quick PCIe Gen 5 x16 host link.
- Software Ecosystem (oneAPI): The open, standards-based Intel oneAPI programming architecture is easily integrated into the design of the Intel Data Center Max Series GPUs. This enables developers to speed up AI pipelines without being locked into proprietary software by using frameworks like HuggingFace Transformers, Pytorch, Intel Extension for Pytorch, and others that are designed for Intel architectures (CPUs and GPUs).
What is the purpose of this code?
Using Chroma as the vector database to hold picture and text embeddings, this code demonstrates how to set up a multimodal database. It then makes it possible to use a text query to search the database for related photos or metadata. Additionally, the code shows how to use Intel Extension for PyTorch (IPEX) to take advantage of Intel’s hardware acceleration for PyTorch in order to speed up calculations on Intel devices, such as CPUs or XPUs (Accelerated Processing Units).
This code’s primary elements are:
- Embedding Texts and Images: It creates embeddings for text and images using OpenCLIP, a CLIP-based model, and stores them in a database for convenient access. OpenCLIP was choice because of its robust performance across a range of benchmarks and easily accessible pre-trained models.
- Chroma Database: By using Chroma to build a permanent database in which the embeddings are kept, it is possible to quickly retrieve the most comparable results from a text query. The selection of ChromaDB was based on its emphasis on developer experience, ease of setting up permanent multimodal collections, and Python-native API.
- The function determines if an XPU is available for hardware acceleration. This configuration is perfect for high-performance applications since it incorporates Intel’s hardware acceleration with IPEX, which optimizes processes like embedding creation and guarantees faster data processing.
Applications and Use Cases
You can use this code in any situation where you need to:
- Store multimodal data: You may require a quick, scalable method to store and retrieve text, images, or both.
- Image Search: E-commerce platforms, image search engines, and recommendation systems can all benefit from the capability to query photos based on textual descriptions. For example, searching for “Black color Benz” will return images of cars that are comparable to that.
- Cross-modal Retrieval: When you need to find similar images using text or vice versa, or when you need to retrieve one modality (images) based on another modality (text). Systems like caption-based picture search and visual question answering frequently exhibit this.
- Recommendation systems: By using similarity-based inquiries, users can be directed to movies, products, or other types of material that share semantic similarities with their query.
- Applications based on AI: These are perfect for situations in machine learning pipelines, like creating training data, extracting features, or preparing data for multimodal models.
Conditions:
- Torch for deep learning tasks.
- For the best PyTorch performance, use intel_extension_for_pytorch (IPEX).
- To create and query a permanent multimodal vector database, use chromaDB.
- matplotlib to show pictures.
- For embedding extraction and picture loading, use the OpenCLIP Embedding Function and Image Loader from chromadb.utils.