#ai-benchmarking

[ follow ]
Artificial intelligence
fromFuturism
2 months ago

Apple Researchers Just Released a Damning Paper That Pours Water on the Entire AI Industry

Apple researchers question the reasoning capabilities of leading AI models, calling current industry claims an 'illusion of thinking'.
fromTechCrunch
3 months ago

LM Arena, the organization behind popular AI leaderboards, lands $100M | TechCrunch

LM Arena has become an essential crowdsourced benchmarking project for AI labs, raising $100 million in seed funding to further its mission of evaluating AI models.
Artificial intelligence
fromTechRepublic
3 months ago

OpenAI's o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims

The performance of OpenAI's o3 model on benchmarks significantly differed from earlier claims, revealing the complexity and variability in AI evaluations.
fromTechCrunch
4 months ago

AI benchmarking platform Chatbot Arena forms a new company | TechCrunch

Chatbot Arena is forming a company called Arena Intelligence Inc. to enhance its benchmarking capabilities significantly while maintaining neutrality in AI testing.
Artificial intelligence
fromtechcrunch.com
4 months ago

Debates over AI benchmarking have reached Pokemon

Last week, a post on X claimed Google's Gemini model surpassed Anthropic's Claude model in Pokemon, stirring controversy over AI benchmarks and implementation.
Artificial intelligence
[ Load more ]