Question your documents: PaLM2 and AI documents for question answering
Physical or digital documents hold a wealth of information, provided that the information can be used to support personnel in carrying out their duties. The goal of internal IT and content management teams has always been to enable knowledge workers to engage with a document, or better yet, a corpus of documents, without having to search through them by hand.
This objective remained unachieved for years because, before the development of generative AI models like PaLM2 and AI Documents, technologies had difficulty supplying the contextual knowledge needed to conduct question-and-answer sessions across various document kinds. However, today’s developers may use text embedding models, PaLM 2, and Google Cloud PaLM2 and AI Documents to create a “Ask your documents” application for staff members. They will demonstrate how in this article.
Why develop a document Q&A application using PaLM2 and AI Documents?
Extracting information from a given document to provide natural language answers to queries is known as document question-answering, or document Q&A. There are many different businesses and disciplines where this kind of process might be used.
As an illustration:
Attorneys and other legal professionals may utilize Document Q&A to look out pertinent facts and precedents for their cases by searching through legislation, case law, and legal documents.
Document Q&A may help teachers and students grasp topics found in textbooks, research papers, and other educational resources.
By rapidly locating information from technical documentation and troubleshooting guides, IT support staff may use Document Q&A to assist in resolving technical difficulties.
With the use of a retrieval Augmented Generation (RAG), you may produce more precise and insightful answers to inquiries by basing them on pertinent data from a knowledge base, such a vector store. PaLM2 and AI Documents OCR (optical character recognition) provide strong capabilities for this job.
This blog post’s suggested solution and architecture provide a serverless, scalable framework for putting a RAG-based architecture into practice on a large scale. They concentrate on Q&A use cases for lengthy docs here.
Advanced architectural design
They utilized PaLM2 and AI Documents, which offers superior, enterprise-ready AI document processing models, for this article. This serverless, scalable, fully managed system can handle millions of documents without requiring infrastructure to be spun up.
More precisely, They extracted text and layout data from document files using Enterprise Document OCR, a pre-trained model. Additionally, you used generative AI to generate a text embedding, or vector representation of text, using the textembedding-gecko model from Vertex AI.
Finally, you used the Vertex AI text-bison foundation model in PaLM2 to respond to inquiries about the embedding data storage. The serverless architecture for Document Q&A with PaLM2 and AI Documents generative AI foundation models.
Any vector storage may be used to store embeddings. In order to demonstrate the implementation for a limited amount of documents, you did not use a vector store in this blog post; rather, the vectors were kept in a data structure. However, you may utilize vector search, a petabyte-scale vector store, to store the embeddings and scale the answer.
A user poses inquiries about the materials
Using the most comparable embeddings in the vector store, the PaLM text-bison model searches the vector store and delivers the response.The user receives their answer back.
Execution
After discussing the architecture, they go over the general procedures needed to create a Q&A tool, enabling you to design Question-Answering applications using Vertex AI SDKs.
This notebook may be used to follow along since it offers more implementation-specific step-by-step instructions.
Step One
- They utilized Alphabet’s earnings releases as our example. The PDFs are available in this publicly accessible Google Cloud Storage bucket:
- First, build an AI OCR processor for documents.
- An interface that facilitates document processing operations between a machine learning model and a document file is called a PaLM2 and AI Documents processor. A document may be categorized, divided, parsed, or examined using processors. Every Google Cloud project must generate its own instances of processors.
- A PDF or picture file is fed into the Document AI processor, which produces the data in the Document format. To construct an Enterprise Document OCR processor, you utilized the Python Client for Document AI library. To process documents, you invoked this processor function.
Step 2: Handle the paperwork
You may begin processing documents with the Enterprise Document OCR processor that you just developed. In order to initiate document processing, provide the file path and processor name as input.
Step Three: Make data slivers
Vertex AI Search and similar technologies allow you to choose a corpus of documents for certain tasks, but a custom method may provide your business the most flexibility in balancing cost, complexity, and accuracy. The greatest results for these bespoke implementations are obtained when the text of a document is divided into manageable ‘chunks’ before to being introduced to the prompt. Chunking is a method for dividing a document into easier-to-process smaller portions. You may do this by segmenting the text into phrases, paragraphs, or even sections.
For the PaLM2 and AI Documents text-bison02 model, the maximum token size is now 8,196 tokens. This implies that a document of up to 8,196 tokens may only be processed by a single call to the PaLM API. Should the document exceed this limit, it will have to be divided into more manageable sections. Another option is PaLM2 32K, which is under preview and has an input capacity of up to 32,000 tokens.
Step 4: Bring in models
Now, do the embeddings for the next step using the PaLM2 text-bison and gecko-embedding models from the Vertex AI SDK for Python.
Step Five: Acquiring the embeddings for every segment
You may begin the implementation using the chunking you completed in step 3 by utilizing the Embeddings API to call the embeddings for each chunk. The Embeddings API may be used to create embeddings if chunking was not done in Step 3.
Step 6: Making inquiry calls to the PaLM text generating APIs
Let’s now discuss how to get answers from the vector store using the get_context_from_question (question, vector store, sort_index_value) function you developed in the previous step and the PaLM text-bison APIs.
To get answers to your inquiry, use the function below to prompt the PaLM2 and AI Documents text-bison model with the top N results of the vector embedding search. Based on user inquiries, the model selects the top N data, which is then sent to the prompt for the generative AI model as context.
In summary
Congratulations! They hope that after reading this article, you have a better understanding of how to utilize PaLM2 and Document AI for answering questions.
By now, you tought to be able to:
A Document AI OCR processor may be used to extract text from PDF documents.
To create embeddings for the retrieved text, use the textembedding-gecko model.
For answers to inquiries about the embeddings data storage, use the PaLM text-bison model. Please feel free to investigate the solution and make changes to the Github repository’s code.
They hope this post showed you how easy it is to build a serverless, RAG-based architecture with Document AI and large language models.
You may also consider checking other Google Cloud products which may better suit your needs like:
- Document AI Custom Extractor to extract specific fields from documents like contracts, invoices, W2s & bills of lading.
- Document AI Summarizer to customize summaries based on your preferences for length and format with no training required. It can provide summaries for documents up to 250 pages long, and you won’t need to manage document chunks or model context windows.
- Use Vertex AI Search and Conversation, an enterprise end-to-end RAG solution, to not only search and summarize against digital pdfs, html and txt documents, but also, to build a chat bot on top of those documents.
[…] with different skills. Google released Pathways Language Model (Palm) in late 2022, followed by Palm 2, and now Gemini Pro. Google also introduced Med-Palm and Sec-Palm domain-specific […]
[…] ML models on GCP using cross-cloud MVs and Google’s large language fundamental models like PaLM 2 and Gemini excites clients to explore novel data interactions. Cross-cloud MVs seamlessly ingest and […]
[…] also needed for the intensive computation to enable quick training. Customers of Vertex AI can tune PaLM 2, FLAN-T5, and Llama 2 models with RLHF by utilizing a Vertex AI Pipeline, which encapsulates the […]