DeepSeek-R1 addresses language mixing issues through a multi-stage training process that combines reinforcement learning (RL) with supervised fine-tuning (SFT). Initially, the model starts with a base version (DeepSeek-V3-Base) that is fine-tuned using a small set of "cold-start" data, which helps establish a structured foundation for understanding diverse languages and contexts[1][2].
The model then employs pure RL techniques to enhance its reasoning capabilities. However, during the training, it was observed that language mixing could occur, particularly when handling queries in languages other than English or Chinese. For example, DeepSeek-R1 might default to English for reasoning and responses even if the input is in another language[5][6]. To mitigate this, the model utilizes rejection sampling to generate synthetic training data from successful RL runs, which is then merged with high-quality supervised data from various domains. This process helps improve the model's adaptability to different languages and reduces the likelihood of mixing languages in responses[1][4].
Furthermore, ongoing research aims to optimize DeepSeek-R1 for better performance across a wider range of languages while addressing its current limitations in handling mixed-language inputs effectively. Future updates are expected to refine its capabilities in this area, enhancing its responsiveness and accuracy when interacting with users in diverse linguistic contexts[2][5].
Citations:
[1] https://www.vellum.ai/blog/the-training-of-deepseek-r1-and-ways-to-use-it
[2] https://adasci.org/mastering-llms-reasoning-capability-with-deepseek-r1/
[3] https://www.reddit.com/r/LocalLLaMA/comments/1iagr6u/how_do_i_finetune_deepseek_r1_on_a_new_language/
[4] https://unfoldai.com/deepseek-r1/
[5] https://arxiv.org/html/2501.12948v1
[6] https://github.com/deepseek-ai/DeepSeek-R1/blob/main/README.md
[7] https://github.com/deepseek-ai/DeepSeek-R1/activity
[8] https://arxiv.org/html/2501.17030v1