Home Arrow Icon Knowledge base Arrow Icon Global Arrow Icon What specific machine learning algorithms does DeepSeek use for formula generation


What specific machine learning algorithms does DeepSeek use for formula generation


DeepSeek utilizes a variety of advanced machine learning algorithms for formula generation and mathematical reasoning, particularly in models like DeepSeek R1. Here's a detailed overview of the specific techniques employed:

1. Chain-of-Thought (CoT) Prompts: DeepSeek R1 leverages lengthy chains of thought to enhance mathematical reasoning. This involves structuring prompts in a way that guides the model through step-by-step reasoning processes, similar to how humans solve complex problems[2]. By generating synthetic training data based on these CoT prompts, DeepSeek R1 can improve its ability to solve mathematical problems more effectively than larger models.

2. Cold Start Fine-Tuning: Initially, DeepSeek R1 undergoes cold start fine-tuning using a compact dataset featuring step-by-step reasoning. This initial stage sets a solid foundation for the model's reasoning capabilities[6][8]. The use of cold-start data helps in establishing a structured approach to problem-solving.

3. Reinforcement Learning (RL): Following the initial fine-tuning, DeepSeek R1 employs pure reinforcement learning to enhance its reasoning skills. This process involves automatically scoring sample answers to nudge the model toward desired behaviors, such as providing step-by-step solutions to mathematical problems[7][8]. RL is crucial for developing the model's ability to reason without relying on labeled data.

4. Rejection Sampling and Supervised Fine-Tuning: Near convergence of the RL process, DeepSeek R1 uses rejection sampling to generate synthetic data. This synthetic data is then merged with supervised data from various domains to further refine the model's knowledge and accuracy[8]. This step ensures that the model learns from both high-quality outputs and diverse domain-specific knowledge.

5. Group Relative Policy Optimization (GRPO): DeepSeek has developed GRPO, a new algorithm designed to make reinforcement learning more efficient. GRPO is used to optimize the training process, allowing the model to perform well across multiple benchmarks[7]. While GRPO is not specifically detailed for formula generation, it contributes to the overall efficiency of DeepSeek's training processes.

6. Multi-Stage Training Process: The multi-stage training approach adopted by DeepSeek R1 involves iterating through different stages of fine-tuning and reinforcement learning. Each stage builds upon the previous one, ensuring that the model develops robust reasoning capabilities while maintaining readability and coherence[6][8]. This structured approach is key to achieving high performance in mathematical reasoning tasks.

In summary, DeepSeek's formula generation capabilities are supported by a combination of advanced NLP techniques, reinforcement learning strategies, and innovative algorithms like GRPO. These methods enable DeepSeek models to excel in mathematical reasoning and problem-solving tasks.

Citations:
[1] https://kalm.works/en/contents/technology/what-is-deepseek-differences-from-chatgpt-and-use-cases
[2] https://www.youtube.com/watch?v=Pabqg33sUrg
[3] https://planetbanatt.net/articles/deepseek.html
[4] https://martinfowler.com/articles/deepseek-papers.html
[5] https://www.inferless.com/learn/the-ultimate-guide-to-deepseek-models
[6] https://www.reddit.com/r/singularity/comments/1i9lkbh/deepseekr1_how_did_they_make_an_openailevel/
[7] https://www.technologyreview.com/2025/01/31/1110740/how-deepseek-ripped-up-the-ai-playbook-and-why-everyones-going-to-follow-it/
[8] https://www.vellum.ai/blog/the-training-of-deepseek-r1-and-ways-to-use-it