IBM Granite 3.1: Faster, Longer, New Embedding Models & More

January 19, 2025

489

Longer context, new embedding models, enhanced performance, and more in IBM Granite 3.1

The most recent version of IBM’s Granite line of open, efficient, and enterprise-optimized language models is IBM Granite 3.1. Enhancing performance, accuracy, and accountability in critical enterprise use cases such as tool use, retrieval augmented generation (RAG), and scalable agentic AI workflows is the main goal of this package of enhancements, extensions, and new features.

The recently released Granite 3.0 collection’s enthusiasm is continued with IBM Granite 3.1. With new multimodal capabilities planned for deployment in Q1 2025, IBM will continue to provide improved Granite 3 series models and functionalities in the upcoming months.

Not all of IBM’s recent contributions to the open source LLM ecosystem are as noteworthy as these new Granite models. The latest release completes a series of cutting-edge open source releases, including a versatile framework for creating AI agents and an easy-to-use tools to extract crucial data hidden in PDFs, slide presentations, and other file formats that are challenging for models to understand. When combined with Granite 3.1 models, these tools and frameworks give developers advanced possibilities for RAG, AI agents, and other LLM-based workflows.

As usual, all of the offerings covered in this article have permissive and standard open source licensing, which reflects IBM’s longstanding dedication to open source.

Granite 3.1 8B Set the standard for lightweight enterprise models higher

The expansion of IBM’s flagship 8B dense model is the clearest indication of the company’s continuous efforts to optimise the Granite series. In terms of average scores on the academic benchmarking evaluations that are part of the Hugging Face OpenLLM Leaderboard, IBM Granite 3.1 8B Instruct currently outperforms the majority of open models in its weight class.

In enterprise use cases, such as agentic AI, the Granite model series has evolved to continue emphasising efficiency and excellence. The latest 8B model’s markedly better performance on IFEval, a dataset that includes tasks that assess a model’s capacity to follow specific instructions, and Multi-step Soft Reasoning (MuSR), whose challenges assess reasoning and comprehension on and of lengthy texts, best illustrates this advancement.

Image credit to IBM
Comparison of model performance across Hugging Face OpenLLM Leaderboard benchmarks

Lengthened context

By expanding the context windows for all models, Granite 3.1’s performance improvement over Granite 3.0 is reinforced. The 128K token context length of IBM Granite 3.1 is comparable to that of other top open model series, such as Qwen2.5 and Llama 3.1–3.3.

The quantity of text, expressed in tokens, that a large language model (LLM) may take into account at any given moment is known as the context window (or context length). A model may analyse larger inputs, conduct longer continuous exchanges, and incorporate more information into each output when it has a broader context window. Although there is no set “exchange rate” for tokens to words, 1.5 tokens per word is a good approximation. A 300-page book is roughly equal to 128K tokens.

Impressive new possibilities arise over a threshold of roughly 100K tokens, such as self-reflection, repository-level code understanding, multi-document question answering, and autonomous agents driven by LLM. As a result, IBM Granite 3.1‘s extended context length supports a far greater variety of enterprise use cases, ranging from concurrently examining thousands of financial transactions to processing code bases and large legal documents in their entirety.

Granite Guardian 3.1: identifying delusions in agent-based processes

The same accountability and trust to function calling that it now offers for RAG are now possible with Granite Guardian 3.1 8B and Granite Guardian 3.1 2B, which may identify hallucinations that may arise in an agentic workflow.

Between the first request made to an AI agent and the output that the agent ultimately provides to the user, a number of steps and subprocesses take place. Granite Guardian 3.1 models keep an eye out for syntactic and semantic hallucinations in each function call to give oversight throughout.

Granite Guardian 3.1, for example, keeps an eye out for fake information flows when an AI agent allegedly queries an external information source. Granite Guardian 3.1 verifies that the agent pulled the right function call and the right numbers if an agentic workflow involves intermediate calculations utilising figures taken from a bank record.

The release today is yet another step in the direction of trust and accountability for every part of a corporate process based on LLM. Hugging Face offers the latest Granite Guardian 3.1 models. Additionally, they will be accessible on IBM Watsonx.ai in January 2025 and through Ollama later this month.

Granite Embedding models

An essential component of the LLM ecosystem are embeddings. Semantic search, vector search, RAG, and the upkeep of efficient vector databases are just a few of the enterprise processes that depend on an accurate and efficient way to represent words, queries, and documents in numerical form. A system’s comprehension of user intent can be greatly improved by an efficient embedding model, which also makes information and sources more pertinent when answering a query.

Open source autoregressive LLMs for tasks like text generation and summarisation have become more and more competitive over the past two years, although major suppliers have released comparatively few open source embedding models.

The Slate family of encoder-only, RoBERTA-based language models has been improved upon with the new Granite Embedding models. Granite Embedding comes in four model sizes, two of which allow multilingual embedding across 12 natural languages. It is trained with the same attention to detail and concern for filtering prejudice, hatred, abuse, and profanity (“HAP”) as the other Granite series.

Granite-Embedding-30M-English
Granite-Embedding-125M-English
Granite-Embedding-107M-Multilingual
Granite-Embedding-278M-Multilingual

IBM confirmed that all of the data sources used to train Granite Embedding were commercially eligible, in contrast to the great majority of open embedding models on the Hugging Face MTEB leaderboard, which rely on training datasets licensed exclusively for research purposes, including MS-MARCO. IBM supports Granite Embedding with the same unlimited indemnity for third-party intellectual property claims as it does for the use of other IBM-developed models, highlighting the care taken to enable enterprise use.

The English Granite Embedding models were able to keep up with well-known open source embedding models of a similar size in internal performance tests utilising the BEIR evaluation framework despite IBM’s careful curation and filtering of training material.

Image credit to IBM
Comparison of average embedding model performance on retrieval tasks, conducted using standard BEIR test framework

Additionally, IBM testing showed that the inference speed of two of the new embedding models, Granite-Embedding-30M-English and Granite-Embedding-107M-Mulilingual, is much faster than that of competing products.

Image credit to IBM
Average model inference speed in internal IBM testing under identical deployment conditions

The open source Granite Embedding model family is the first step in IBM Research’s ambitious plan for ongoing innovation. Multimodal retrieval capabilities, RAG optimisation, and context extension are among the updates and enhancements scheduled for 2025.

Agentic AI and document decoding

With the recent creation and open source distribution of cutting-edge new tools and frameworks for working with LLMs, IBM is demonstrating its continued dedication to open source AI in addition to the Granite series’ continuous improvement. These IBM-built tools, which are optimised for Granite models but are open and model neutral by nature, enable developers to fully utilise LLMs, from constructing autonomous AI agents to regularising RAG sources to fine-tuning pipelines.

Docling: pretraining, fine-tuning, and RAG preparation papers

In the end, generative AI from RAG to creative writing is a data-driven engine. Large language models cannot reach their full potential if some of the data is stored in formats that the models are unable to understand. As stated in a headline from the Washington Post ten years ago, “the solutions to all our problems may be buried in PDFs that nobody reads.” LLMs are relatively new, but the problem is not.

Docling, a potent tool for parsing documents in widely used formats like PDF, DOCX, pictures, PPTX, XLSX, HTML, and AsciiDoc and transforming them into model-friendly formats like Markdown or JSON, was created by IBM Deep Search for this reason. For RAG and other procedures, this makes it possible for models like Granite to readily access those documents and the data contained within. Developers may easily integrate Docling’s help into their preferred ecosystem by integrating it with agentic frameworks like LlamaIndex, LangChain, and Bee.

Docling is a sophisticated solution that goes beyond basic text extraction and optical character recognition (OCR) and is open sourced under the permissive MIT License. According to Red Hat’s William Caban, Docling incorporates several contextual and element-based preprocessing techniques: Docling knows to extract a table as a single table if it spans multiple pages; if a page contains body text, images, and tables, each of these needs to be extracted separately in accordance with its original context.

The Docling team is currently developing new capabilities, such as the ability to extract equations, codes, and information.

Bee: an open-model agentic AI framework

Designed for use with Granite and Llama models, the Bee Agent Framework is an open source framework for creating robust agentic AI workflows with open source LLMs. Additional model-specific optimisations are still being developed. It has several observability features that give the insights and accountability required for production deployment, as well as a variety of modules that let developers modify nearly every aspect of the AI agent, from memory handling to tool use to error handling.

Multiple models and a variety of powerful, ready-to-use tools, such as weather services and internet search (or bespoke tools written in Python or Javascript), are smoothly integrated with the framework. As seen in this recipe combining Granite and Wikipedia, which makes better use of a constrained context window by utilising built-in tools, Bee’s versatile tool use feature allows processes customised to your unique situation.

Ollama may be used to run Granite Bee agents locally, or Watsonx.ai can be used to utilise hosted inference.

IBM’s timeseries forecasting Watsonx.ai

Granite’s TinyTimeMixer (TTM) timeseries models are a family of lightweight, pre-trained models built on a new architecture that were released earlier this year. Granite Timeseries models outperform several models up to ten times their size, such as TimesFM, Moirai, and Chronos, when it comes to zero-shot and few-shot forecasting for anything from IoT sensor data to stock market pricing and energy demands. Hugging Face alone has had over 3.25 million downloads of granite-timeseries-TTM models since May 30.

Granite timeseries models are now accessible on IBM’s integrated AI platform for end-to-end AI application development, following the beta release of the Watsonx.ai Timeseries Forecasting API and SDK in November.

Beginning to use IBM Granite 3.1

IBM Watsonx.ai now offers IBM Granite 3.1 models. Additionally, they are accessible through platform partners such as Hugging Face, LM Studio, Ollama, Replicate, and Docker (via its DockerHub GenAI catalogue). In January 2025, a few IBM Granite 3.1 variants will also be offered by NVIDIA (as NIM Microservices).

IBM Granite 3.1: Faster, Longer, New Embedding Models & More

Granite 3.1 8B Set the standard for lightweight enterprise models higher

Lengthened context

Granite Guardian 3.1: identifying delusions in agent-based processes

Granite Embedding models

Agentic AI and document decoding

Docling: pretraining, fine-tuning, and RAG preparation papers

Bee: an open-model agentic AI framework

IBM’s timeseries forecasting Watsonx.ai

Beginning to use IBM Granite 3.1

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

Bolttech Improves Customer Experience with AWS Generative AI

LEAVE A REPLY Cancel reply

Page Content

Recent Posts

AMD Radeon Pro W6600 Benchmark in CAD, Video Editing

Intel Core Ultra 5 225H Performance for Everyday Tasks

Intel Core i9 13900K Price, Benchmark, and Specifications

NVIDIA Tesla V100 Price, Features And Specifications

Google Magic Mirror Experience Driven by Gemini Models

Pluto AI: A New Internal AI Platform For Enterprise Growth

About Us