SFT LLMs
Consumers tell us that they believe there is a lot of promise in applying large language models (LLMs) to their data for a variety of upcoming generative AI use cases, such as enhancing customer experiences, automating internal procedures, finding and accessing information, and producing new content. There are numerous methods to take advantage of your data; Google Cloud will go over some of the most popular strategies and uses in this blog post, along with what you should know to get started.
How to use foundation models with your data
Google Cloud must comprehend how LLMs and other foundation models can interact with your data before it can begin to visualise a generative AI application.
Prompt engineering
Including the data in the instructions, or system prompt, that are delivered to the model is the simplest way to enable interactions between a model and your data. This method’s powerful and alluring feature that the model doesn’t need to be modified or adjusted may be constrictive for particular use situations. with instance, whereas regularly updated information, like sports scores or airfare costs, can be readily added to a system prompt and used to guide interactions, this isn’t the case with static information.
Retrieval augmented generation (RAG)
Model outputs can be made sure to be firmly based on your data by using retrieval augmented generation, or RAG. AI programmes designed for RAG can explore your data for facts pertinent to a query, then pass that information into the prompt, eliminating the need for the model’s training knowledge. Prompt engineering and this are comparable, but with each interaction, the system can discover and retrieve fresh context from your data.
Large-scale and multimodal data, private data that you connect, continuously updated fresh data, and more are all supported by the RAG approach and its growing ecosystem of products, which ranges from straightforward database integrations to the embedding of APIs and other parts for custom systems.
Supervised fine-tuning (SFT LLM)
You may wish to think about SFT LLM, also known as Parameter Efficient Fine Tuning (PEFT), if you want to provide a model with particular instructions for a task that is clearly specified. Tasks like classification or producing organised outputs from unstructured text might benefit greatly from this.
You must give the model input-output pairs to learn from in order to execute supervised fine tuning. The supervised tuning procedure will require several transcripts in addition to the meeting categories, for instance, if you wish to categorise meeting transcripts into different categories. The process of tuning will determine the classification you think is appropriate for your meetings.
Reinforcement Learning from Human Feedback (RLHF)
What happens if your objective is difficult to quantify or is not properly defined into categories? Let’s say, for instance, that you want a model to have a specific tone (may be a brand voice, or a certain level of formality). A method called Reinforcement Learning from Human Feedback, or RLHF, builds a model that is strengthened by human preferences and tailored to your particular requirements.
In a word, the algorithm looks like this: Your data takes the form of input prompts and output responses, but the latter must be given in pairs two logical answers, one of which you think is better than the other. For instance, one may be accurate but generic, while the other would be both accurate and employ a linguistic style that you would prefer for your final products.
Distillation
Distillation is a brilliant technique that combines two objectives: reducing the size of the model so that it can handle data more quickly and making it more task-specific. It functions by “teaching” a smaller model from a bigger foundation model while concentrating that instruction on your task and data.
Consider the scenario where you wish to use a smaller model to double-check every email you send in order to make them seem more formal. In order to do this, you feed the big model the input (the original text plus the directive to “make this email more formal”), and it returns the output (the revised email). With your inputs and the huge model outputs at your disposal, you can now train a tiny, specialised model to replicate this particular activity. You can supply your own input/output pairs in addition to the ones from the foundation models.
Which to choose?
The first thing to think about is whether or not the model must always provide a citation to a source that is supported by your data. If so, RAG will have to be used. Another advantage of RAG is that, depending on who is calling the model, you can manage who has access to what grounding data. This will improve the results’ interpretability and assist you in fending off hallucinations.
In the event that those conditions are not met, you will have to determine whether prompt engineering is sufficient or if the model has to be adjusted. Prompt engineering might be sufficient for small amounts of data, and as context windows expand as demonstrated by Gemini 1.5’s 1 million-token window it is also becoming practical for larger amounts of data.
If you decide to tune, you’ll need to weigh your alternatives based on how precise and challenging it is to measure the behaviour you want from your chosen model. RLHF is the best option if your desired model generates an output that is hard to explain and hence likely requires human intervention. If not, a variety of tuning techniques could be selected based on your budget, the level of personalisation you need for your model, and how quickly you need things to be served.
An abbreviated form of the logic Google Cloud explains is this decision tree:
How about combining approaches?
Why can’t Google Cloud employ additional techniques, one would wonder? As an illustration, Google Cloud wants to optimise his model to have his brand voice and wants it to generate responses using only his data (RAG). That’s feasible as well, and frequently the better choice! A model can be fine-tuned and then applied to a different task. To ensure the model acts as intended, you may also adjust an SFT LLM and then apply in-context prompt engineering to it. In conclusion, you are free to mix and match the previously described techniques as you see fit.
Start now!
Begin with a basic step. That will not only expedite you but also provide you with a starting point from which to test and experiment to see what functions best for your application.
All of these features are available for trial on Google Cloud! Try the RAG implementation provided by Prompt Engineering and Vertex AI Agent Builder. If you prefer to implement RAG yourself, you can construct and store it using Google Cloud’s Embeddings or Multimodal Embeddings APIs and Vector Search. Additionally, try distillation, RLHF tuning, and supervised fine tuning. Additionally, Google Cloud’s code samples might be examined for assistance.