The new AI search solution from Google Distributed Cloud can help you find sensitive data more quickly (but safely).
Organizations are now able to handle and analyze data in new ways, uncover hidden insights, boost productivity, and create new applications with generative AI. Low-latency needs, legal compliance, and data sovereignty, however, can be difficult to meet. It can be challenging to take advantage of the cloud’s innovation, scalability, and cost-efficiency benefits when sensitive data must be kept in specific places, tight rules must be followed, and quick responses are required.
Google’s AI services are available wherever you need them with Google Distributed Cloud (GDC), whether it’s at the edge or in your own data centre. Google Distributed Cloud is a fully managed hardware and software solution with a wide range of services that was created with AI and data-intensive workloads in mind. You have the option of running it air-gapped from the public internet or linked to Google Cloud’s systems, and it comes in a variety of expandable hardware form factors with top industry independent software vendor (ISV) solutions integrated via GDC Marketplace.
In order to bring the increasingly popular retrieval-augmented generation (RAG) to your on-premises environment and enable multimodal and multilingual natural-language search experiences across your text, image, voice, and video data, we go into detail in this blog post about Google Distributed Cloud’s new AI-optimized servers with NVIDIA H100 Tensor Core GPUs and its gen AI search packaged solution, which is currently available in preview.
Infrastructure optimized for Gen AI
New servers featuring NVIDIA H100 GPUs, driven by the cutting-edge NVIDIA Hopper architecture and the fifth-generation Intel Xeon Scalable processors, are now part of GDC air-gapped. For AI workloads utilising large language models (LLMs) with up to 100 billion parameters, the new servers offer the new GPU-optimized A3 VM family optimised for NVIDIA NVLink connectivity to Google Distributed Cloud, allowing for faster shared compute and memory. In order to meet the demands of AI services with lower ownership costs, it also expands the range of NVIDIA Multi-Instance GPU (MIG) profiles by supporting a number of novel GPU slicing methods (both uniform and mixed-mode) and dynamic allocation of GPU resources.
On-premises conversational search that is ready to use
The Gemma 2 LLM with 9 billion parameters serves as the foundation for Google Distributed Cloud‘s new generation AI Search solution, which is an on-premise conversational search solution that is ready for deployment. In addition to ensuring that the search queries and data stay on-premises, you can swiftly uncover the most pertinent information and material with natural language search and effortlessly ingest your sensitive on-premise data into the search solution. This enhances employee productivity and knowledge exchange.
In order to minimize hallucinations, responses also provide citation links to the source documents. To see the solution in action, view the demo below:
The Google Distributed Cloud gen AI search solution uses a RAG architecture to combine the advantages of generative AI and traditional search, resulting in more accurate results. Prior to being transmitted to the LLM for answer generation, user queries are supplemented with pertinent on-premise data. Additional out-of-the-box core integrations include Vertex AI pre-trained APIs for multimodal and multilingual data ingestion across text, images, and audio (translation for 105 languages, speech-to-text for 13 languages, and optical character recognition for 46 supported and 24 experimental languages). Additionally, it incorporates the AlloyDB Omni database service for semantic search across imported data and embeddings storage.
You may also modify this solution to suit your needs by swapping out any components, such as for different database services like Elasticsearch, other open-source models and LLMs, or your own proprietary models, with Google Distributed Cloud‘s open cloud strategy.
Summary
A new generative AI search solution is now available on-premises through Google Cloud’s Distributed Cloud (GDC). This solution, which is presently in preview, offers quick, safe, and conversational search across a variety of data sources (text, image, voice, and video) using NVIDIA H100 GPUs and the Gemma 2 LLM. It has citations to lower inaccuracies and a retrieval-augmented generation (RAG) architecture for more accurate output. The software allows for integration with multiple databases and LLMs and is quite configurable. Two essential components that guarantee sensitive data stays on-premises are data sovereignty and compliance.