#swe-bench

[ follow ]
Artificial intelligence
fromZDNET
3 days ago

Claude Sonnet 4.5 could be your next breakthrough coding tool - how to access it today

Claude Sonnet 4.5 outperforms previous Anthropic models and competitors on coding benchmarks, with enhanced agentic, reasoning, and long-running task capabilities.
Artificial intelligence
fromInfoQ
1 month ago

Anthropic's Claude Opus 4.1 Improves Refactoring and Safety, Scores 74.5% SWE-bench Verified

Claude Opus 4.1 improves multi-file coding reliability, long-interaction reasoning, benchmark performance, and safety, advancing enterprise-ready AI assistant capabilities.
[ Load more ]