Wednesday, February 12, 2025

DeepSeek R1: A Revolutionary LLM for Complex Reasoning

Overview

DeepLearning presents its first-generation DeepSeek R1-Zero and DeepSeek-R1 reasoning models. The reasoning performance of DeepSeek-R1-Zero, a model trained using large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a precondition, was impressive. DeepSeek-R1-Zero spontaneously developed a number of strong and intriguing reasoning behaviours with RL. However, issues like language mixing, poor readability, and unending repetition plague DeepSeek-R1-Zero.

In order to overcome these problems and improve reasoning performance even more, it present DeepSeek-R1, which uses cold-start data prior to RL. In math, programming, and reasoning problems, DeepSeek-R1 performs on par with OpenAI-o1. It has made DeepSeek-R1-Zero, DeepSeek-R1, and six dense models derived from DeepSeek-R1 based on Llama and Qwen publicly available to assist the research community. DeepSeek R1-Distill-Qwen-32B achieves new state-of-the-art results for dense models, surpassing OpenAI-o1-mini on many benchmarks.

Synopsis of the Model

Large-Scale Reinforcement Learning on the Base Model after Training

Instead of using supervised fine-tuning (SFT) as a precondition, it apply reinforcement learning (RL) directly to the basic model. This method enables the model to investigate chain-of-thought (CoT) for resolving intricate issues, leading to the creation of DeepSeek-R1-Zero. An important milestone for the research community is reached when DeepSeek-R1-Zero exhibits skills including self-verification, reflection, and the generation of lengthy CoTs. Notably, it is the first publicly available study to confirm that LLMs’ reasoning abilities may be encouraged just by RL, without the use of SFT. Future developments in this field are made possible by this discovery.

It presents the workflow it used to create DeepSeek R1. In addition to two SFT stages that act as the foundation for the reasoning and non-reasoning capabilities of the model, the pipeline includes two RL stages that seek to identify better reasoning patterns and conform to human preferences.

Distillation: Even Smaller Models Have Potential

  • In contrast to the reasoning patterns found by RL on tiny models, it show that the reasoning patterns of bigger models can be reduced to smaller models, leading to superior performance. In the future, the research community will be able to distil superior smaller models thanks to the open source DeepSeek R1 and its API.
  • It improved on a number of dense models that are often employed in the research community using the reasoning data produced by DeepSeek-R1. Based on benchmarks, the evaluation results show that the distilled smaller dense models function remarkably well. Based on the Qwen2.5 and Llama3 series, it made the distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints publicly available.

Downloads of Models

Models of DeepSeek-R1

ModelTotal ParamsActivated ParamsContext LengthDownload
DeepSeek-R1-Zero671B37B128K🤗 HuggingFace
DeepSeek-R1671B37B128K🤗 HuggingFace

DeepSeek-V3-Base serves as the basis for training DeepSeek-R1-Zero and DeepSeek-R1.

ModelBase ModelDownload
DeepSeek-R1-Distill-Qwen-1.5BQwen2.5-Math-1.5B🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-7BQwen2.5-Math-7B🤗 HuggingFace
DeepSeek-R1-Distill-Llama-8BLlama-3.1-8B🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-14BQwen2.5-14B🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-32BQwen2.5-32B🤗 HuggingFace
DeepSeek-R1-Distill-Llama-70BLlama-3.3-70B-Instruct🤗 HuggingFace

Using DeepSeek R1-generated samples, DeepSeek-R1-Distill models are refined based on open-source models. It make minor adjustments to their tokenisers and configurations. Please run these models using its settings.

Evaluation Findings

R1-DeepSeek Assessment

The maximum generation length for each of its models is 32,768 tokens. It estimate pass@1 using a temperature of 0.6, a top-p value of 0.9, and 64 replies per query for benchmarks that use sampling.

Chat Platform & Website

You can activate the “DeepThink” button and have a conversation with DeepSeek-R1 on DeepSeek’s official website, chat.deepseek.com.

Additionally, it offer DeepSeek Platform’s OpenAI-Compatible API at platform.deepseek.com.

Local Operating Procedures

DeepSeek-R1 Models

For further details on how to run DeepSeek R1 locally, please see the DeepSeek-V3 repository.

DeepSeek-R1-Distill Models

DeepSeek-R1-Distill models can be used similarly to Llama or Qwen models.

For example, vLLM makes it simple to launch a service.

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B –tensor-parallel-size 2 –max-model-len 32768 –enforce-eager

SGLang also makes it simple to launch a service.

python3 -m sglang.launch_server –model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B –trust-remote-code –tp 2

Usage Suggestions

To get the desired performance when using the DeepSeek R1 series models, including benchmarking, it advise following these configurations:

  • In order to avoid incoherent outputs or endless repetitions, set the temperature between 0.5 and 0.7 (0.6 is advised).
  • The user prompt should contain all instructions; do not include a system prompt.
  • It is best to include a direction in your prompt for mathematical questions, such as “Please reason step by step, and put your final answer within \boxed{}.”
  • It is advised to run several tests and average the outcomes when assessing model performance.

Permit

The MIT License governs both this code repository and the model weights. Commercial use is supported by the DeepSeek R1 series, which also permits derivative works and changes, such as distillation for teaching other LLMs. Please be aware that:

  • The Qwen-2.5 series, which were first licensed under the Apache 2.0 License, are the source of DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B, and DeepSeek-R1-Distill-Qwen-32B. DeepSeek-R1 has been refined with 800k samples selected using DeepSeek-R1.
  • Derived from Llama3.1-8B-Base, DeepSeek-R1-Distill-Llama-8B was first licensed under the Llama3.1 license.
  • Derived from Llama3.3-70B-Instruct, DeepSeek-R1-Distill-Llama-70B was initially licensed under the llama3.3 license.
Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes