It’s no secret that creating your own conversational AI system can now be done with great performance on smaller hardware platforms, accessibility to the larger AI development community, and efficiency thanks to open-source large language models (LLMs) like Falcon-7B and Zephyr-7B. These are 7-billion parameter models, which are smaller than 13-billion and 70-billion+ parameter models. The “7B” tag denotes that these are 7-billion parameter models.
Although GPT-4 performs better than these smaller models at the moment, they predict that by 2024, the difference between the large, closed-source models and the open-source tiny models will be less. They give a quick overview of the two open-source 7B models mentioned earlier in this post, along with instructions on how to use them with the newest Intel GPU.
Falcon-7B
The Technology Innovation Institute created the untrained, pre-trained model Falcon-7B. It isn’t directly appropriate for chat on its own. Once adjusted to your dataset, it becomes a powerful LLM that can generate text for chat, text categorization, and question-answering. I T is an Apache License, Version 2.0-licensed open-source model for academic and commercial use.
Only the English and French versions of the RefinedWeb dataset were used to train the Falcon-7B. It is an enormous 2.8TB web-only dataset made up of 968M distinct web pages that was selected from a larger dataset from CommonCrawl. A larger and more potent LLM from the same family as Falcon-7B, Falcon-40B, was also trained using I T.
Zephyr-7B
Zephyr-7B is a model developed by Hugging Face that was released under the MIT License after being adjusted from Mistral-7B. Zephyr, in contrast to Falcon-7B, is already optimized for chat and ready for a chat program right out of the box. The UltraChat and UltraFeedback datasets, which are explained in greater detail below, were used to fine-tune Zephyr-7B.
UltraChat
A fictitious dataset of conversations produced using ChatGPT is called UltraChat. One phony conversation clip, for instance, has:
What are the differences in innovation that you see between big and small businesses?
They are an AI language model, so they are not biased, but they may still offer information based on my training set. Innovation between large and small businesses might differ in a number of ways: Resources: Research and development, experimentation, and prototyping can be funded more easily by large companies. However, tiny businesses might not have as much money as larger ones, so they have to be more deliberate in their innovation efforts.
This is just a small portion of the conversation because each training case is rather long. These kind of dialogue examples cover a vast range of subjects and number 774K. The discussion involves “rewriting, continuation, summarization, and inference,” as stated in the dataset description.
SuperFeedback
64K prompts from several models, such as GPT-3.5 Turbo, MPT-30B-Chat, Alpaca 7B, Pythia-12B, StarChat, and others, are gathered in the UltraFeedback dataset. For every prompt, four distinct responses are produced, for a total of 256K samples. The gathered samples are then annotated using GPT-4.
Using the Intel Developer Cloud to get started
To run the LLM examples yourself, you can start for free with a Jupyter Notebook hosted on the Intel Developer Cloud. This notebook combines the newest Intel AI hardware with Intel-optimized AI applications. The aforementioned models are available for use by I T. You can start using these two models right away because they were recently included to the Simple LLM Inference notebook. Launch the Jupyter Notebook by clicking the Launch button under “Simple LLM Inference: Playing with Language Models” on the homepage.
Remarks regarding the code
The Intel Developer Cloud instance comes with the necessary Python frameworks pre-installed, including torch, intel_extension_for_pytorch, and transformers. Fill the Falcon 7-B and Zephyr-7B models using the standard transformer framework:
from import transformers AutoTokenizer, AutoModelForCausalLM
The tokenizer and model are really instantiated here:
self.tokenizer = AutoTokenizer.from_pretrained(
model_id_or_path,
trust_remote_code=True,
cache_dir=”/home/common/data/Big_Data/GenAI/”
)
self.model = (
AutoModelForCausalLM.from_pretrained(
model_id_or_path,
low_cpu_mem_usage=True,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
cache_dir=”/home/common/data/Big_Data/GenAI/”,
)
.to(self.device)
.eval()
)
Both PyTorch and Intel Extension for PyTorch are pre-installed in the conda environment pytorch-gpu that is included with the notebook in order to maximize the performance of the newest Intel Data Center Series GPU Max 1100. If necessary, you can install these on your own instances by visiting the GitHub links.
The Intel Extension for PyTorch is utilized for two primary purposes, which are:
ipex.optimize_transformers(self.model, dtype=torch.bfloat16)
and
ipex.optimize(self.model, dtype=torch.bfloat16)
The loaded LLM model is denoted by self.model, and the data type is torch.bfloat16, which uses a lower data type on the Intel GPU to improve performance. One of the wonderful things about this extension is that if you are coming from another platform, you would need to modify very little code. All you should need to do is make these little code modifications and switch the device to xpu.
In brief
Compared to their much bigger model counterparts (such as Falcon-180B), Falcon-7B and Zephyr-7B are smaller LLMs, but they nevertheless provide performant and efficient inference. One model that can be optimized for many text tasks, such as question answering, text classification, and chat, is Falcon-7B. Zephyr-7B is an excellent conversation device right out of the box because it was already optimized from a different model named Mistral-7B.
After registration, go to the Intel Developer Cloud homepage and select “Simple LLM Inference: Playing with Language Models” to utilize any of the two models using the given sample Jupyter Notebook. You are free to bring your own Hugging Face models and check these ones out as well. They would be interested in knowing how you used these models on the Intel Developer Cloud.
Notice Regarding the Use of Large Language Models
Although LLMs such as Falcon-7B and Zephyr-7B are excellent resources for text production, please be advised that occasionally they may yield unexpected, biased, or inconsistent results with the prompt. It is advisable to go over the generated text very carefully and think about the usage and context of these models.
Additionally, using these models must abide by the terms of their license agreements, ethical standards, and best practices for A I.
[…] reducing GenAI Projects computation costs, Intel’s well-proven open AI solutions provide […]