#benchmark-results

[ follow ]
fromTechzine Global
2 days ago

Claude Sonnet 4.5 can code autonomously for 30 hours

Anthropic claims that Claude Sonnet 4.5 is the best code model in the world. It also comes with significant improvements in reasoning and mathematical skills. On OSWorld, a benchmark for AI models that perform real-world computing tasks, Sonnet 4.5 leads with 61.4 percent. Four months ago, Sonnet 4 scored 42.2 percent on this test. Along with the model, Anthropics is also introducing the Claude Agent SDK.
Artificial intelligence
[ Load more ]