#benchmark-performance

[ follow ]
fromTechCrunch
5 days ago

Anthropic releases Opus 4.5 with new Chrome and Excel integrations | TechCrunch

On Monday, Anthropic announced Opus 4.5, the latest version of its flagship model. It's the last of Anthropic's 4.5 series of models to be released, following the launch of Sonnet 4.5 in September and Haiku 4.5 in October. As expected, the new version of Opus has state-of-the-art performance on a range of benchmarks, including coding benchmarks (SWE-Bench and Terminal-bench), tool use (tau2-bench and MCP Atlas) and general problem solving (ARC-AGI 2, GPQA Diamond).
Artificial intelligence
fromIT Pro
1 week ago

Google launches flagship Gemini 3 model and Google Antigravity, a new agentic AI development platform

Google has officially unveiled Gemini 3 Pro, its new state of the art LLM with record-breaking scores across almost every AI benchmark. The new model is intended to improve every Google service that uses Gemini, including its dedicated app, coding tools, and AI in search. Google stated that Gemini 3 Pro is much better at handling requests in their intended context and providing useful answers that don't resort to flattery. The model debuted in the number one spot across text, WebDev, and vision in , with Google having Gemini 3 Pro is the "best model in the world for complex multimodal understanding".
Artificial intelligence
Gadgets
fromZDNET
1 month ago

I saw the future of Windows PCs - and it may finally be time to ditch my MacBook

Qualcomm's Snapdragon X2 Elite Extreme delivers major CPU, GPU, and NPU performance gains while maintaining energy-efficient, all-day laptop operation.
Artificial intelligence
fromTechCrunch
2 months ago

Anthropic launches Claude Sonnet 4.5, its best AI model for coding | TechCrunch

Claude Sonnet 4.5 claims state-of-the-art coding performance and can build production-ready applications while maintaining the same developer pricing as Sonnet 4.
Artificial intelligence
fromInfoQ
2 months ago

DeepSeek Releases v3.1 Model with Hybrid Reasoning Architecture

DeepSeek V3.1 combines a hybrid thinking/non-thinking architecture, 128k-token context, FP8 precision, 671B parameters, and strong cost-efficient coding and reasoning performance.
Mobile UX
fromGSMArena.com
3 months ago

OpenAI releases ChatGPT-5 - its best AI model to date with PhDlevel intelligence

GPT-5 is OpenAI's most advanced AI model, providing expert-level intelligence and enhanced performance across various fields.
Artificial intelligence
fromHackernoon
1 year ago

phi-3-mini: The 3.8B Powerhouse Reshaping LLM Performance on Your Phone | HackerNoon

Phi-3-mini is a high-performance language model that rivals larger models while being optimized for deployment on mobile devices.
[ Load more ]