#coding-benchmarks tag

Elon Musk is tearing xAI down to build it back up. Again.

xAI has lost ten of its twelve original co-founders, is underperforming in coding benchmarks, and Musk is rebuilding the company from scratch for the second time.

Artificial intelligence

fromInfoQ

2 months ago

Moonshot AI Releases Open-Weight Kimi K2.5 Model with Vision and Agent Swarm Capabilities

Kimi K2.5 is an open-weight multimodal LLM excelling at coding and parallel agent workflows, with vision integration and a PARL-trained orchestrator.

Artificial intelligence

fromArs Technica

5 months ago

Anthropic introduces cheaper, more powerful, more efficient Opus 4.5 model

Opus 4.5 improves conversational memory to avoid abrupt hard-stops and raises coding benchmark accuracy above 80 percent while excelling at agentic coding.

fromArs Technica

5 months ago

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity

Now, Google's increasingly unavoidable AI is getting an upgrade. Gemini 3 Pro is available in a limited form today, featuring more immersive, visual outputs and fewer lies, Google says. The company also says Gemini 3 sets a new high-water mark for vibe coding, and Google is announcing a new AI-first integrated development environment (IDE) called Antigravity, which is also available today.

Artificial intelligence

fromArs Technica

6 months ago

Anthropic's Claude Haiku 4.5 matches May's frontier model at fraction of cost

Haiku 4.5 is a faster, lower-cost small coding model designed for real-time tasks and multi-model workflows, offering near-Sonnet and competitive GPT-5 benchmark performance.

Artificial intelligence

fromArs Technica

11 months ago

New Claude 4 AI model refactored code for 7 hours straight

Anthropic's Claude Opus 4 excels at coding, outperforming Gemini, and can operate independently for extended periods.

#coding-benchmarks#coding-benchmarks

Elon Musk is tearing xAI down to build it back up. Again.

Moonshot AI Releases Open-Weight Kimi K2.5 Model with Vision and Agent Swarm Capabilities

Anthropic introduces cheaper, more powerful, more efficient Opus 4.5 model

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity

Anthropic's Claude Haiku 4.5 matches May's frontier model at fraction of cost

New Claude 4 AI model refactored code for 7 hours straight

#coding-benchmarks
#coding-benchmarks