#model-benchmarking

[ follow ]
fromComputerworld
2 days ago

OpenAI's GPT is getting better at mathematics

OpenAI's GPT-5.2 Pro does better at solving sophisticated math problems than older versions of the company's top large language model, according to a new study by Epoch AI, a non-profit research institute.
Artificial intelligence
Artificial intelligence
fromBig Think
1 month ago

Inside the meteoric rise of Mercor

Expert-labeled AI evaluations propelled Mercor's rapid growth, becoming a critical industry benchmark and revenue driver for top model developers and tech giants.
Artificial intelligence
fromLogRocket Blog
2 months ago

AI dev tool power rankings & comparison [Nov 2025] - LogRocket Blog

An evidence-based power ranking and 50+ feature comparison identifies top AI models and AI-powered development tools for frontend development as of November 2025.
Artificial intelligence
fromFuturism
3 months ago

OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace

OpenAI's GDPval evaluates AI performance across 44 occupations and finds frontier models approaching expert-level quality with clear economic relevance.
[ Load more ]