#ai-benchmarks

[ follow ]
Artificial intelligence
fromThe Verge
14 hours ago

'Holy shit': Gemini 3 is winning the AI race - for now

Google's Gemini 3 immediately topped benchmarks and leaderboards, integrated into Google Search on day one, and attracted over one million users within 24 hours.
Artificial intelligence
fromwww.cbc.ca
3 days ago

China might be winning the AI race. Does it matter? | CBC Accessibility

Moonshot AI's Kimi K2 Thinking narrows China's AI performance gap with the U.S., scoring near ChatGPT on advanced reasoning benchmarks and outranking several rivals.
fromZDNET
6 days ago

Google's Gemini 3 is finally here and it's smarter, faster, and free to access

On Tuesday, Google finally launched Gemini 3, which the company claims is the "best model in the world for multimodal understanding and our most powerful agentic and vibe coding model yet." The claim is supported by benchmark data, crowd-sourced Arena results, and more advanced use cases that the chatbot has not previously been able to tackle. Also: Let Gemini make your next slide presentation for you - here's how
Gadgets
Artificial intelligence
fromwww.theguardian.com
3 weeks ago

Experts find flaws in hundreds of tests that check AI safety and effectiveness

Hundreds of AI benchmarks contain flaws that undermine validity of model safety and capability claims, making many evaluation scores misleading or irrelevant.
Artificial intelligence
fromTechzine Global
3 weeks ago

JetBrains launches AI benchmark platform DPAI Arena

DPAI Arena provides an open, community-driven benchmark for objectively measuring AI coding agents across multiple languages, workflows, and reproducible evaluation pipelines.
fromFortune
1 month ago

AI models are getting very good at professional tasks, new OpenAI research shows | Fortune

Google CEO Sundar Pichai was right when he said that while AI companies aspire to create AGI (artificial general intelligence), what we have right now is more like AJI-artificial jagged intelligence. What Pichai meant by this is that today's AI is brilliant at some things, including some tasks that even human experts find difficult, while also performing poorly at some tasks that a human would find relatively easy.
Artificial intelligence
#model-evaluation
Artificial intelligence
fromInfoWorld
7 months ago

Learning how to measure genAI's impact

AI model improvements are often difficult to quantify accurately.
Smaller language models may outperform larger ones in practical applications.
The debate on AGI misdefines human intelligence benchmarks.
[ Load more ]