Claude Sonnet 4.5 can code autonomously for 30 hours

"Anthropic claims that Claude Sonnet 4.5 is the best code model in the world. It also comes with significant improvements in reasoning and mathematical skills. On OSWorld, a benchmark for AI models that perform real-world computing tasks, Sonnet 4.5 leads with 61.4 percent. Four months ago, Sonnet 4 scored 42.2 percent on this test. Along with the model, Anthropics is also introducing the Claude Agent SDK."

"Claude Sonnet 4.5 scores highest on SWE-bench Verified, an evaluation that measures real-world software development skills. Here, the percentage is 77.2, compared to 74.5 percent for Opus 4.1 and GPT-5 Codex. According to Anthropic, the model can remain focused on complex, multi-step tasks for more than 30 hours. This is a significant improvement over previous versions. As a result, Claude Sonnet 4.5 can code autonomously for 30 hours."

"Claude Sonnet 4.5 is available today via the Claude API under the name claude-sonnet-4-5. Pricing remains the same as Claude Sonnet 4: $3 per million input tokens and $15 per million output tokens. In addition to the standard features, Anthropic has also added checkpoints to Claude Code, one of the most requested features. Users can now save their progress and instantly return to a previous state. A native VS Code extension is also available."

Claude Sonnet 4.5 improves coding, reasoning, and mathematical skills and achieves 61.4% on the OSWorld benchmark, up from 42.2% for Sonnet 4 four months earlier. Anthropic also released the Claude Agent SDK to developers, addressing memory, access-rights, and subagent coordination over six months. Sonnet 4.5 scores 77.2% on SWE-bench Verified, outpacing Opus 4.1 and GPT-5 Codex, and can remain focused on complex multi-step tasks for more than 30 hours, enabling autonomous coding sessions. The model is available as claude-sonnet-4-5 with unchanged pricing and adds checkpoints and a VS Code extension. Safety and alignment show notable improvements.

#code-generation #benchmark-results #agent-sdk #safety-alignment

Read at Techzine Global

Unable to calculate read time

Collection

[

...

]

Claude Sonnet 4.5 can code autonomously for 30 hoursClaude Sonnet 4.5 can code autonomously for 30 hours Briefly

Claude Sonnet 4.5 can code autonomously for 30 hours
Claude Sonnet 4.5 can code autonomously for 30 hours
Briefly