DeepSeek-R1 vs GPT-4o on AIME 2024 Benchmark: Performance Comparison

How does DeepSeek-R1's performance on the AIME 2024 benchmark compare to other models like GPT-4o-0513

DeepSeek-R1's performance on the AIME 2024 benchmark is notable, as it scores 79.8%, slightly ahead of OpenAI o1-1217, which achieves 79.2%[1]. However, there is limited direct comparison available between DeepSeek-R1 and GPT-4o-0513 specifically on the AIME 2024 benchmark.

GPT-4o models are generally known for their robust performance across various tasks, but specific results for GPT-4o-0513 on AIME 2024 are not detailed in the available information. GPT-4o models are typically strong in language understanding and generation tasks, but their performance on specialized mathematical reasoning benchmarks like AIME might vary compared to models specifically optimized for such tasks, like DeepSeek-R1.

DeepSeek-R1's strong performance on AIME 2024 can be attributed to its architecture, which incorporates large-scale reinforcement learning to enhance reasoning capabilities. This approach allows it to excel in tasks requiring advanced multi-step mathematical reasoning[1][3]. In contrast, GPT-4o models are more generalized and might not have the same level of specialization in mathematical reasoning tasks.

Overall, while DeepSeek-R1 demonstrates superior performance on AIME 2024 compared to OpenAI o1-1217, direct comparisons with GPT-4o-0513 are not explicitly provided in the available data. However, DeepSeek-R1's specialized training and architecture likely contribute to its strong showing in mathematical reasoning benchmarks.

Citations:
[1] https://www.datacamp.com/blog/deepseek-r1
[2] https://blog.getbind.co/2025/01/23/deepseek-r1-vs-gpt-o1-vs-claude-3-5-sonnet-which-is-best-for-coding/
[3] https://docsbot.ai/models/compare/gpt-4o-2024-05-13/deepseek-r1
[4] https://artificialanalysis.ai/models/deepseek-r1
[5] https://docsbot.ai/models/compare/deepseek-r1/gpt-4o
[6] https://llm-stats.com/models/compare/deepseek-r1-vs-gpt-4o-2024-08-06
[7] https://www.prompthub.us/blog/deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1
[8] https://docsbot.ai/models/compare/gpt-4o/deepseek-r1