
"Anthropic can present several benchmarks showing that the model outperforms its competitors. Claude Opus 4.6 scores highest on Terminal-Bench 2.0. This benchmark assesses agents based on their capabilities in terminal environments. To do this, Terminal-Bench 2.0 subjects each agent fed into the benchmark to a number of standard tasks. Opus 4.6 scores 65.4. The previous Anthropic version, Opus 4.5, scored 59.8. GPT-5.2-codex, released in December, comes closest at 64.7."
"The new adaptive thinking feature gives developers more control over how deeply the model thinks. Whereas previously only extended thinking could be turned on or off, Claude can now determine for itself when more thorough reasoning is useful. Four effort levels (low, medium, high, and max) offer additional flexibility. These levels allow the user to determine how many tokens Claude uses for a response. High is the default setting, but this can be adjusted."
Claude Opus 4.6 improves coding and reasoning capabilities through improved planning, adaptive thinking, and a 1 million token context window. The model achieves a Terminal-Bench 2.0 score of 65.4, surpassing Opus 4.5 (59.8) and edging out GPT-5.2-codex (64.7). Opus 4.6 also leads on Humanity's Last Exam (range 40–53.1) and GDPval-AA (1606 versus GPT-5.2's 1462). Adaptive thinking introduces four effort levels letting developers control depth of reasoning and token use, with high as default. Context compaction summarizes older context to extend long sessions. Premium pricing applies above 200,000 tokens. Claude in Excel has been upgraded for more complex tasks.
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]