#benchmark-performance tag

Xiaomi releases open-weight MiMo-V2.5 AI model, claims "frontier-level agentic capability"

Xiaomi's MiMo-V2.5 is an advanced open-weight AI model with superior multimodal capabilities and competitive performance against leading models.

Software development

fromTNW | Anthropic

3 weeks ago

Claude Opus 4.7 leads on SWE-bench and agentic reasoning, beating GPT-5.4 and Gemini 3.1 Pro

Claude Opus 4.7 is Anthropic's most capable model, outperforming competitors in software engineering and agentic reasoning with significant improvements.

fromTechzine Global

2 months ago

Anthropic acquires Vercept to optimize Claude's computer use

Computer use enables Claude to perform multi-step tasks in live applications, just as a person would at a keyboard. This means that the AI can solve problems that are impossible with code alone. Recent progress speaks for itself: on the OSWorld benchmark for computer use, the Sonnet models went from below 15 percent at the end of 2024 to 72.5 percent today.

Artificial intelligence

fromTechCrunch

5 months ago

Anthropic releases Opus 4.5 with new Chrome and Excel integrations | TechCrunch

On Monday, Anthropic announced Opus 4.5, the latest version of its flagship model. It's the last of Anthropic's 4.5 series of models to be released, following the launch of Sonnet 4.5 in September and Haiku 4.5 in October. As expected, the new version of Opus has state-of-the-art performance on a range of benchmarks, including coding benchmarks (SWE-Bench and Terminal-bench), tool use (tau2-bench and MCP Atlas) and general problem solving (ARC-AGI 2, GPQA Diamond).

Artificial intelligence

fromIT Pro

5 months ago

Google launches flagship Gemini 3 model and Google Antigravity, a new agentic AI development platform

Google has officially unveiled Gemini 3 Pro, its new state of the art LLM with record-breaking scores across almost every AI benchmark. The new model is intended to improve every Google service that uses Gemini, including its dedicated app, coding tools, and AI in search. Google stated that Gemini 3 Pro is much better at handling requests in their intended context and providing useful answers that don't resort to flattery. The model debuted in the number one spot across text, WebDev, and vision in , with Google having Gemini 3 Pro is the "best model in the world for complex multimodal understanding".

Artificial intelligence

Gadgets

fromZDNET

6 months ago

I saw the future of Windows PCs - and it may finally be time to ditch my MacBook

Qualcomm's Snapdragon X2 Elite Extreme delivers major CPU, GPU, and NPU performance gains while maintaining energy-efficient, all-day laptop operation.

Artificial intelligence

fromTechCrunch

7 months ago

Anthropic launches Claude Sonnet 4.5, its best AI model for coding | TechCrunch

Claude Sonnet 4.5 claims state-of-the-art coding performance and can build production-ready applications while maintaining the same developer pricing as Sonnet 4.

Artificial intelligence

fromInfoQ

8 months ago

DeepSeek Releases v3.1 Model with Hybrid Reasoning Architecture

DeepSeek V3.1 combines a hybrid thinking/non-thinking architecture, 128k-token context, FP8 precision, 671B parameters, and strong cost-efficient coding and reasoning performance.

Artificial intelligence

fromFuturism

8 months ago

Scientists Are Getting Seriously Worried That We've Already Hit Peak AI

GPT-5's release has not met expectations, showing limited improved utility despite better performance on benchmarks.

Mobile UX

fromGSMArena.com

9 months ago

OpenAI releases ChatGPT-5 - its best AI model to date with PhDlevel intelligence

GPT-5 is OpenAI's most advanced AI model, providing expert-level intelligence and enhanced performance across various fields.

Artificial intelligence

fromHackernoon

2 years ago

phi-3-mini: The 3.8B Powerhouse Reshaping LLM Performance on Your Phone | HackerNoon

Phi-3-mini is a high-performance language model that rivals larger models while being optimized for deployment on mobile devices.

#benchmark-performance#benchmark-performance

Xiaomi releases open-weight MiMo-V2.5 AI model, claims "frontier-level agentic capability"

Claude Opus 4.7 leads on SWE-bench and agentic reasoning, beating GPT-5.4 and Gemini 3.1 Pro

Anthropic acquires Vercept to optimize Claude's computer use

Anthropic releases Opus 4.5 with new Chrome and Excel integrations | TechCrunch

Google launches flagship Gemini 3 model and Google Antigravity, a new agentic AI development platform

I saw the future of Windows PCs - and it may finally be time to ditch my MacBook

Anthropic launches Claude Sonnet 4.5, its best AI model for coding | TechCrunch

DeepSeek Releases v3.1 Model with Hybrid Reasoning Architecture

Scientists Are Getting Seriously Worried That We've Already Hit Peak AI

OpenAI releases ChatGPT-5 - its best AI model to date with PhDlevel intelligence

phi-3-mini: The 3.8B Powerhouse Reshaping LLM Performance on Your Phone | HackerNoon

#benchmark-performance
#benchmark-performance