Monday, February 17, 2025

Presenting The Falcon 3 Family: Unlocking AI Innovation

Hugging Face presents Falcon 3, a set of 10-billion-parameter decoder-only Large language models created by Abu Dhabi’s Technology Innovation Institute (TII). This release demonstrates the continued dedication to developing open and accessible large foundation models by pushing the limits of performance and training efficiency. Falcon 3, which focuses on enhancing the models’ science, math, and coding capabilities, is a logical progression from earlier iterations.

There are five base models in this iteration:

  • Falcon3-1B-Base
  • Falcon3-3B-Base
  • 7B-Base Falcon3-Mamba
  • Falcon3-7B-Base
  • Falcon3-10B-Base

Having included a number of significant changes in the development of these models with the goal of enhancing their performance and lowering training expenses:

  • One transformer-based model pre-training: With 1024 H100 GPU chips and 14 trillion tokens of web, code, STEM, and carefully selected high-quality, multilingual data, researchers performed a single, extensive pretraining run on the 7B model.
  • Upscaling depth to enhance reasoning: It upgraded the 7B model to a 10B parameters model by replicating the superfluous layers and carrying on with pre-training using 2 trillion tokens of high-quality data, building on previous research on the impact of model depth. This resulted in Falcon3-10B-Base, which, for models with 13B parameters, delivers cutting-edge zero-shot and few-shot performance.
  • Knowledge distillation for better micro models: Also redefined pre-training efficiency by combining pruning and knowledge distillation approaches to create Falcon3-1B-Base and Falcon3-3B-Base, which are compact and efficient alternatives that use less than 100GT of curated high-quality data.
  • Pure SSM: By training on an extra 1.5 trillion tokens of high-quality data, people have improved Falcon Mamba 7B even more, creating Falcon3-Mamba-7B-Base. Interestingly, the new model has far better mathematics and reasoning powers.
  • Other variations: To provide flexibility for a variety of applications, all Falcon3 family devices are offered in variants such Instruct, GGUF, GPTQ-Int4, GPTQ-Int8, AWQ, and 1.58-bit.

Important Points

By exhibiting excellent performance on widely used benchmarks, Falcon 3 highlighted the boundaries of big language models’ small and medium scales:

  • Falcon3-1B-Base is comparable to gemma-2-2b and outperforms SmolLM2-1.7B.
  • Falcon3-3B-Base performs better than bigger models such as Llama-3.1-8B and Minitron-4B-Base, demonstrating the advantages of information distillation pre-training.
  • Among models under the 9B scale, Falcon3-7B-Base performs best, comparable to Qwen2.5-7B.
  • With impressive performance in the under-13B division, Falcon3-10B-Base is the cutting edge.
  • Better integration within the AI ecosystem is made possible by the compatibility of all transformer-based Falcon 3 models with Llama architecture.

With support for a greater 32K context length, Falcon3-Mamba-7B maintains its position as the best-performing State Space Language Model (SSLM), equal or even outperforming top transformer-based LLMs at the 7B scale. Because Falcon3-Mamba-7B has the same architecture as the original Falcon Mamba 7B, users may incorporate it easily and without further work.

The Falcon3-7B-Instruct and Falcon3-10B-Instruct surpass all instruct models under the 13B scale on the open leaderboard, demonstrating the instruct versions of the collection of basic models’ exceptional performance across a variety of benchmarks.

Improved Capabilities

It reported raw scores after evaluating the models using an in-house evaluation pipeline, which is based on lm-evaluation-harness. The focus on improving performance in scientific domains, reasoning, and general knowledge capacities is reflected in these assessments, which highlight important areas where the Falcon 3 family of models excel:

Math Skills: Falcon3-10B-Base demonstrates improved thinking in challenging math-focused activities with scores of 22.9 on MATH-Lvl5 and 83.0 on GSM8K.
Coding Proficiency: Falcon3-10B-Base scores 73.8 on MBPP and 45.8 on Multipl-E, demonstrating their ability to generalize across activities related to programming.

Extended Context Length: Falcon 3 family models include functional enhancements such scoring 86.3 on BFCL (Falcon3-10B-Instruct), and they support up to 32k tokens (with the exception of the 1B, which supports up to 8k context).

Better Reasoning: The Falcon3-7B-Base and Falcon3-10B-Base models score 51.0 and 59.7 on BBH, respectively, indicating superior reasoning ability. The 10B model outperforms the 7B in this regard.

Scientific Knowledge Expansion: Falcon3-7B-Base scored 67.4/39.2 (MMLU/MMLU-PRO) on the MMLU benchmarks, whereas Falcon3-10B-Base scored 73.1/42.5 (MMLU/MMLU-PRO), indicating improvements in specialized knowledge.

Model Specifications and Benchmark Findings

The following table provides a summary of the Falcon 3 series of models’ detailed specs. The head size of 256 in the Falcon3-7B-Base architecture results in great throughput when FlashAttention-3 is used because it is optimized for this dimension. These decoder-only models have a vocabulary capacity of 131K tokens (65K for Mamba-7B), and they range from 18 to 40 layers for transformer-based models and 64 layers for mamba models. All models share the SwiGLU activation function. The Falcon3-7B-Base needs far less data than the other variations because it has been trained on the most data, guaranteeing thorough coverage of ideas and knowledge.

Falcon3 series of models' detailed specs

Teach models

Across the assessed benchmarks, Falcon3-1B-Instruct and Falcon3-3B-Instruct exhibit strong performance. In example, Falcon3-1B achieves competitive performance in IFEval (54.4), MUSR (40.7), and SciQ (86.8), whereas Falcon3-3B shows additional gains, especially in MMLU-PRO (29.7) and MATH (19.9), which clearly show scaling effects. Falcon models outperform both Qwen and Llama in reasoning and commonsense comprehension, although not outperforming all rival models on every criterion. Within this internal assessment process:

  • The lm-evaluation harness is what everyone utilize.
  • Unlike Llama3.1, here report the raw scores that were acquired by using the conversation template without fewshot_as_multiturn.
  • It employ the same batch size for every model.

Additionally, Falcon3-7B and Falcon3-10B exhibit strong performance on all assessed benchmarks. While Falcon3-10B shows significant gains, particularly in GSM8K (83.1) and IFEval (78), demonstrating strong scaling benefits, Falcon3-7B gets competitive scores on maths (GSM8K: 79.1) and reasoning (Arc Challenge: 65.9, MUSR: 46.4).

Falcon3-7B and Falcon3-10B exhibit strong performance on all assessed benchmarks

Open Source Dedication

All models in the Falcon 3 family are made available under the Falcon LLM license, which is consistent with the objective of promoting AI accessibility and cooperation.

Inevitably believe that these models will be useful to the AI community for future experiments, application development, and research. Falcon 3 is a continuation, not a conclusion, of the job to develop more specialized, effective, and capable foundation models. Other Falcon 3 family models with improved multi-modal capabilities, such as support for images, videos, and audio, as well as a comprehensive technical report outlining to procedures, will be released in January 2025.

Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes