Overview
DeepLearning presents its first-generation DeepSeek R1-Zero and DeepSeek-R1 reasoning models. The reasoning performance of DeepSeek-R1-Zero, a model trained using large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a precondition, was impressive. DeepSeek-R1-Zero spontaneously developed a number of strong and intriguing reasoning behaviours with RL. However, issues like language mixing, poor readability, and unending repetition plague DeepSeek-R1-Zero.
In order to overcome these problems and improve reasoning performance even more, it present DeepSeek-R1, which uses cold-start data prior to RL. In math, programming, and reasoning problems, DeepSeek-R1 performs on par with OpenAI-o1. It has made DeepSeek-R1-Zero, DeepSeek-R1, and six dense models derived from DeepSeek-R1 based on Llama and Qwen publicly available to assist the research community. DeepSeek R1-Distill-Qwen-32B achieves new state-of-the-art results for dense models, surpassing OpenAI-o1-mini on many benchmarks.
Synopsis of the Model
Large-Scale Reinforcement Learning on the Base Model after Training
Instead of using supervised fine-tuning (SFT) as a precondition, it apply reinforcement learning (RL) directly to the basic model. This method enables the model to investigate chain-of-thought (CoT) for resolving intricate issues, leading to the creation of DeepSeek-R1-Zero. An important milestone for the research community is reached when DeepSeek-R1-Zero exhibits skills including self-verification, reflection, and the generation of lengthy CoTs. Notably, it is the first publicly available study to confirm that LLMs’ reasoning abilities may be encouraged just by RL, without the use of SFT. Future developments in this field are made possible by this discovery.
It presents the workflow it used to create DeepSeek R1. In addition to two SFT stages that act as the foundation for the reasoning and non-reasoning capabilities of the model, the pipeline includes two RL stages that seek to identify better reasoning patterns and conform to human preferences.
Distillation: Even Smaller Models Have Potential
- In contrast to the reasoning patterns found by RL on tiny models, it show that the reasoning patterns of bigger models can be reduced to smaller models, leading to superior performance. In the future, the research community will be able to distil superior smaller models thanks to the open source DeepSeek R1 and its API.
- It improved on a number of dense models that are often employed in the research community using the reasoning data produced by DeepSeek-R1. Based on benchmarks, the evaluation results show that the distilled smaller dense models function remarkably well. Based on the Qwen2.5 and Llama3 series, it made the distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints publicly available.
Downloads of Models
Models of DeepSeek-R1
Model | Total Params | Activated Params | Context Length | Download |
---|---|---|---|---|
DeepSeek-R1-Zero | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-R1 | 671B | 37B | 128K | 🤗 HuggingFace |
DeepSeek-V3-Base serves as the basis for training DeepSeek-R1-Zero and DeepSeek-R1.
Model | Base Model | Download |
---|---|---|
DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 HuggingFace |
DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 HuggingFace |
DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 HuggingFace |
DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 HuggingFace |
DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 HuggingFace |
DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 HuggingFace |
Using DeepSeek R1-generated samples, DeepSeek-R1-Distill models are refined based on open-source models. It make minor adjustments to their tokenisers and configurations. Please run these models using its settings.
Evaluation Findings
R1-DeepSeek Assessment
The maximum generation length for each of its models is 32,768 tokens. It estimate pass@1 using a temperature of 0.6, a top-p value of 0.9, and 64 replies per query for benchmarks that use sampling.
Chat Platform & Website
You can activate the “DeepThink” button and have a conversation with DeepSeek-R1 on DeepSeek’s official website, chat.deepseek.com.
Additionally, it offer DeepSeek Platform’s OpenAI-Compatible API at platform.deepseek.com.
Local Operating Procedures
DeepSeek-R1 Models
For further details on how to run DeepSeek R1 locally, please see the DeepSeek-V3 repository.
DeepSeek-R1-Distill Models
DeepSeek-R1-Distill models can be used similarly to Llama or Qwen models.
For example, vLLM makes it simple to launch a service.
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B –tensor-parallel-size 2 –max-model-len 32768 –enforce-eager
SGLang also makes it simple to launch a service.
python3 -m sglang.launch_server –model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B –trust-remote-code –tp 2
Usage Suggestions
To get the desired performance when using the DeepSeek R1 series models, including benchmarking, it advise following these configurations:
- In order to avoid incoherent outputs or endless repetitions, set the temperature between 0.5 and 0.7 (0.6 is advised).
- The user prompt should contain all instructions; do not include a system prompt.
- It is best to include a direction in your prompt for mathematical questions, such as “Please reason step by step, and put your final answer within \boxed{}.”
- It is advised to run several tests and average the outcomes when assessing model performance.
Permit
The MIT License governs both this code repository and the model weights. Commercial use is supported by the DeepSeek R1 series, which also permits derivative works and changes, such as distillation for teaching other LLMs. Please be aware that:
- The Qwen-2.5 series, which were first licensed under the Apache 2.0 License, are the source of DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B, and DeepSeek-R1-Distill-Qwen-32B. DeepSeek-R1 has been refined with 800k samples selected using DeepSeek-R1.
- Derived from Llama3.1-8B-Base, DeepSeek-R1-Distill-Llama-8B was first licensed under the Llama3.1 license.
- Derived from Llama3.3-70B-Instruct, DeepSeek-R1-Distill-Llama-70B was initially licensed under the llama3.3 license.