
"And speaking of cost, Haiku 4.5 is included for subscribers of Claude web and app plans. Through the API (for developers), the small model is priced at $1-per-million input tokens and $5-per-million output tokens. That compares to Sonnet 4.5 at $3-per-million input and $15-per-million output tokens, and Opus 4.1 at $15-per-million input and a whopping $75-per-million output tokens. The model serves as a cheaper drop-in replacement for two older models, Haiku 3.5 and Sonnet 4."
""Users who rely on AI for real-time, low-latency tasks like chat assistants, customer service agents, or pair programming will appreciate Haiku 4.5's combination of high intelligence and remarkable speed," Anthropic writes. On SWE-bench Verified, a test that measures performance on coding tasks, Haiku 4.5 scored 73.3 percent compared to Sonnet 4's similar performance level (72.7 percent). The model also reportedly surpasses Sonnet 4 at certain tasks like using computers, according to Anthropic's benchmarks."
"Still, making a small, capable coding model may have unexpected advantages for agentic coding setups like Claude Code. Anthropic designed Haiku 4.5 to work alongside Sonnet 4.5 in multi-model workflows. In such a configuration, Anthropic says, Sonnet 4.5 could break down complex problems into multi-step plans, then coordinate multiple Haiku 4.5 instances to complete subtasks in parallel, like spinning off workers to get things done faster."
Haiku 4.5 is a small, low-latency model included with Claude web and app plans and available via API at $1 per million input and $5 per million output tokens. Pricing undercuts Sonnet 4.5 and Opus 4.1 while positioning Haiku 4.5 as a drop-in replacement for Haiku 3.5 and Sonnet 4. The model scored 73.3 percent on SWE-bench Verified versus Sonnet 4's 72.7 percent and reportedly outperforms Sonnet 4 on some tasks. Haiku 4.5 is intended for real-time assistants and to operate alongside Sonnet 4.5 in multi-model workflows to parallelize subtasks. Benchmark results are self-reported and warrant cautious interpretation.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]