AI models are starting to crack high-level math problems

"Over the weekend, Neel Somani, who is a software engineer, former quant researcher, and a startup founder, was testing the math skills of OpenAI's new model when he made an unexpected discovery. After pasting the problem into ChatGPT and letting it think for 15 minutes, he came back to a full solution. He evaluated the proof and formalized it with a tool called Harmonic - but it all checked out."

"AI tools have become ubiquitous in mathematics, from formalization-oriented LLMs like Harmonic's Aristotle to literature review tools like OpenAI's deep research. But since the release of GPT 5.2 - which Somani describes as "anecdotally more skilled at mathematical reasoning than previous iterations" - the sheer volume of solved problems has become difficult to ignore, raising new questions about large language models' ability to push the frontiers of human knowledge."

Neel Somani tested OpenAI's new model on a challenging math problem and observed the model produce a full solution after about fifteen minutes, then formalized and verified the proof with Harmonic. The model used chain-of-thought reasoning and cited results such as Legendre's formula, Bertrand's postulate, and the Star of David theorem. The model located a 2013 MathOverflow post by Noam Elkies but produced a distinct and in some ways more complete proof for a version of an Erdős problem. The release of GPT-5.2 coincided with more solved problems and broader use of AI in mathematics, including formalizers and literature-review tools.

#large-language-models #mathematical-reasoning #erdos-problems #formal-verification

Read at TechCrunch

Unable to calculate read time

Collection

[

...

]

AI models are starting to crack high-level math problems | TechCrunchAI models are starting to crack high-level math problems | TechCrunch Briefly

AI models are starting to crack high-level math problems | TechCrunch
AI models are starting to crack high-level math problems | TechCrunch
Briefly