Meta launches new Llama 3.1 models, including anticipated 405B parameter version.
Meta released Llama 3.1, a multilingual LLM collection. Llama 3.1 includes pretrained and instruction-tuned text in/text out open source generative AI models with 8B, 70B, and 405B parameters.
Today, IBM watsonx.ai will offer the instruction-tuned Llama 3.1-405B, the largest and most powerful open source language model available and competitive with the best proprietary models.It can be set up on-site, in a hybrid cloud environment, or on the IBM cloud.
Llama 3.1 follows the April 18 debut of Llama 3 models. Meta stated in the launch release that “[their] goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across LLM capabilities such as reasoning and coding.”
Llama 3.1’s debut today shows tremendous progress towards that goal, from dramatically enhanced context length to tool use and multilingual features.
An significant step towards open, responsible, accessible AI innovation
Meta and IBM launched the AI Alliance in December 2023 with over 50 global initial members and collaborators. The AI Alliance unites leading business, startup, university, research, and government organisations to guide AI’s evolution to meet society’s requirements and complexities. Since its formation, the Alliance has over 100 members.
Additionally, the AI Alliance promotes an open community that helps developers and researchers accelerate responsible innovation while maintaining trust, safety, security, diversity, scientific rigour, and economic competitiveness. To that aim, the Alliance supports initiatives that develop and deploy benchmarks and evaluation standards, address society-wide issues, enhance global AI capabilities, and promote safe and useful AI development.
Llama 3.1 gives the global AI community an open, state-of-the-art model family and development ecosystem to explore, experiment, and responsibly scale new ideas and techniques. The release features strong new models, system-level safety safeguards, cyber security evaluation methods, and improved inference-time guardrails. These resources promote generative AI trust and safety tool standardisation.
How Llama 3.1-405B compares to top models
The April release of Llama 3 highlighted upcoming Llama models with “over 400B parameters” and some early model performance evaluation, but their exact size and details were not made public until today’s debut. Llama 3.1 improves all model sizes, but the 405B open source model matches leading proprietary, closed source LLMs for the first time.
Looking beyond numbers
Performance benchmarks are not the only factor when comparing the 405B to other cutting-edge models. Llama 3.1-405B may be built upon, modified, and run on-premises, unlike its closed source contemporaries, which can change their model without notice. That level of control and predictability benefits researchers, businesses, and other entities that seek consistency and repeatability.
Effective Llama-3.1-405B usage
IBM, like Meta, believes open models improve product safety, innovation, and the AI market. An advanced 405B-parameter open source model offers unique potential and use cases for organisations of all sizes.
Aside from inference and text creation, which may require quantisation or other optimisation approaches to execute locally on most hardware systems, the 405B can be used for:
Synthetic data can fill the gap in pre-training, fine-tuning, and instruction tuning when data is limited or expensive. The 405B generates high-quality task- and domain-specific synthetic data for LLM training. IBM’s Large-scale Alignment for chatBots (LAB) phased-training approach quickly updates LLMs with synthetic data while conserving model knowledge.
The 405B model’s knowledge and emergent abilities can be reduced into a smaller model, combining the capabilities of a big “teacher” model with the quick and cost-effective inference of a “student” model (such an 8B or 70B Llama 3.1). Effective Llama-based models like Alpaca and Vicuna need knowledge distillation, particularly instruction tailoring on synthetic data provided by bigger GPT models.
LLM-as-a-judge: The subjectivity of human preferences and the inability of standards to approximate them make LLM evaluation difficult. The Llama 2 research report showed that larger models can impartially measure response quality in other models. Learn more about LLM-as-a-judge’s efficacy in this 2023 article.
A powerful domain-specific fine-tune: Many leading closed models allow fine-tuning only on a case-by-case basis, for older or smaller model versions, or not at all. Meta has made Llama 3.1-405B accessible for pre-training (to update the model’s general knowledge) or domain-specific fine-tuning coming soon to the watsonx Tuning Studio.
Meta AI “strongly recommends” using a platform like IBM watsonx for model evaluation, safety guardrails, and retrieval augmented generation to deploy Llama 3.1 models.
Every llama 3.1 size gets upgrades
The long-awaited 405B model may be the most notable component of Llama 3.1, but it’s hardly the only one. Llama 3.1 models share the dense transformer design of Llama 3, but they are much improved at all model sizes.
Longer context windows
All pre-trained and instruction-tuned Llama 3.1 models have context lengths of 128,000 tokens, a 1600% increase over 8,192 tokens in Llama 3. Llama 3.1’s context length is identical to the enterprise version of GPT-4o, substantially longer than GPT-4 (or ChatGPT Free), and comparable to Claude 3’s 200,000 token window. Llama 3.1’s context length is not constrained in situations of high demand because it can be installed on the user’s hardware or through a cloud provider.. Llama 3.1 has few usage restrictions.
An LLM can consider or “remember” a certain amount of tokenised text (called its context window) at any given moment. To continue, a model must trim or summarise a conversation, document, or code base that exceeds its context length. Llama 3.1‘s extended context window lets models have longer discussions without forgetting details and ingest larger texts or code samples during training and inference.
Text-to-token conversion doesn’t have a defined “exchange rate,” but 1.5 tokens per word is a good estimate. Thus, Llama 3.1’s 128,000 token context window contains 85,000 words. The Hugging Face Tokeniser Playground lets you test multiple tokenisation models on text inputs.
Llama 3.1 models benefit from Llama 3’s new tokeniser, which encodes language more effectively than Llama 2.
Protecting safety
Meta has cautiously and thoroughly expanded context length in line with its responsible innovation approach. Previous experimental open source attempts produced Llama derivatives with 128,000 or 1M token windows. These projects demonstrate Meta’s open model commitment, however they should be approached with caution: Without strong countermeasures, lengthy context windows “present a rich new attack surface for LLMs” according to recent study.
Fortunately, Llama 3.1 adds inference guardrails. The release includes direct and indirect prompt injection filtering from Prompt Guard and updated Llama Guard and CyberSec Eval. CodeShield, a powerful inference time filtering technology from Meta, prevents LLM-generated unsafe code from entering production systems.
As with any generative AI solution, models should be deployed on a secure, private, and safe platform.
Multilingual models
Pretrained and instruction tailored Llama 3.1 models of all sizes will be bilingual. In addition to English, Llama 3.1 models speak Spanish, Portuguese, Italian, German, and Thai. Meta said “a few other languages” are undergoing post-training validation and may be released.
Optimised for tools
Meta optimised the Llama 3.1 Instruct models for “tool use,” allowing them to interface with applications that enhance the LLM’s capabilities. Training comprises creating tool calls for specific search, picture production, code execution, and mathematical reasoning tools, as well as zero-shot tool use the capacity to effortlessly integrate with tools not previously encountered in training.
Starting Llama 3.1
Meta’s latest version allows you to customise state-of-the-art generative AI models for your use case.
IBM supports Llama 3.1 to promote open source AI innovation and give clients access to best-in-class open models in watsonx, including third-party models and the IBM Granite model family.
IBM Watsonx allows clients to deploy open source models like Llama 3.1 on-premises or in their preferred cloud environment and use intuitive workflows for fine-tuning, prompt engineering, and integration with enterprise applications. Build business-specific AI apps, manage data sources, and expedite safe AI workflows on one platform.