
"The model, DeepSeekMath-V2, scored 118 out of 120 points on questions from the 2024 William Lowell Putnam Mathematical Competition, beating the top human score of 90. The model also performed at the level of gold-medal winners in the International Mathematical Olympiad (IMO) 2025 and the 2024 China Mathematical Olympiad. The results are described in a preprint posted on arXiv on 27 November."
"Early approaches to training large language models for mathematical reasoning focused on the accuracy of final answers, the preprint authors write. But a correct answer does not guarantee correct reasoning. At times, a correct final answer might just be a result of a fortunate error. Moreover, an exclusive focus on the end result is not useful in proving mathematical laws or formulae, when the logical reasoning is more important than the final answer."
""We are at a point where AI is about as good at maths as a smart undergraduate student," says Kevin Buzzard, a mathematician at Imperial College London. "It is very exciting." In February, AlphaGeometry 2, an AI problem solver created by Google DeepMind in London, also achieved a gold-level performance in the IMO. The feat was repeated in July by Gemini's Deep Think, which is owned by DeepMind."
DeepSeekMath-V2 is a mathematical reasoning model that identifies and corrects its own errors and attains elite competition performance. The model scored 118 out of 120 on the 2024 William Lowell Putnam Mathematical Competition, outperforming the top human score of 90, and matched gold-medal levels in the IMO 2025 and the 2024 China Mathematical Olympiad. The model implements self-verifiable reasoning via a verifier trained to evaluate mathematical proofs. Training emphasis has shifted from final-answer accuracy toward rewarding logical reasoning, because correct final answers can sometimes arise from fortunate errors. A preprint describing these results was posted on arXiv on 27 November.
Read at Nature
Unable to calculate read time
Collection
[
|
...
]