DeepSeek R1 Model Performance on AIME Benchmark - Comparison with OpenAI o1 and Other AI Models

How does DeepSeek's performance on AIME 1 compare to other AI models

DeepSeek's R1 model demonstrates strong performance on the American Invitational Mathematics Examination (AIME) benchmark, which is a prestigious mathematics competition for high school students. Here's a detailed comparison of DeepSeek R1's performance with other AI models on AIME:

1. DeepSeek R1 vs. OpenAI o1: DeepSeek R1 has shown competitive results on AIME, with a score of 79.8% on AIME 2024, slightly ahead of OpenAI o1-1217 at 79.2%[9]. However, OpenAI o1 achieved a higher score of 96.7% in another comparison, indicating variability in performance metrics or versions of the models used[8]. DeepSeek R1-Zero, a precursor model, scored 71.0% on AIME 2024, which is slightly below OpenAI o1-0912 but above o1-mini[1].

2. Comparison with Other Models: In a broader comparison, DeepSeek R1 performed well but was not the top scorer. For instance, OpenAI o3 Mini took the top spot with an accuracy of 86.5% on AIME, followed by DeepSeek R1 and o1[2]. This suggests that while DeepSeek R1 is competitive, it may not always outperform the latest models like o3 Mini.

3. Performance Variability: The performance of AI models on AIME can vary significantly depending on the specific version of the test. For example, models generally performed better on the older AIME 2024 questions compared to the newer AIME 2025 questions, possibly due to the inclusion of previous questions in their training data[2].

4. Reasoning Capabilities: DeepSeek R1's strong performance on AIME is attributed to its advanced reasoning capabilities, which allow it to tackle complex mathematical problems effectively. However, its performance can decline when faced with variants of questions that require deeper logical reasoning[7].

Overall, DeepSeek R1 demonstrates robust performance on AIME, competing closely with other top models like OpenAI o1, though it may not always lead in every comparison. Its open-source nature and cost efficiency make it an attractive choice for developers looking to leverage advanced reasoning capabilities in mathematics.

Citations:
[1] https://www.prompthub.us/blog/deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1
[2] https://www.vals.ai/benchmarks/aime-2025-03-11
[3] https://www.vellum.ai/blog/analysis-openai-o1-vs-deepseek-r1
[4] https://artificialanalysis.ai/models/deepseek-r1
[5] https://techcrunch.com/2025/01/27/deepseek-claims-its-reasoning-model-beats-openais-o1-on-certain-benchmarks/
[6] https://www.byteplus.com/en/topic/386612
[7] https://www.reddit.com/r/LocalLLaMA/comments/1ibxhwp/deepseekr1distillqwen32b_2024_aime_i_performance/
[8] https://www.techloy.com/deepseek-r1-v-openai-o1-which-ai-model-is-better/
[9] https://www.datacamp.com/blog/deepseek-r1