Data science
fromMedium
5 days agoAI KPIs That Matter: Moving Beyond Model Accuracy in 2026
Measuring AI success requires connecting model performance to business outcomes, not just focusing on accuracy metrics.
New research suggests an AI agent can't fully replace a human consultant - at least for now. Mercor, the AI training giant, tested how well leading AI models, acting as agents, performed real-world consulting, banking, and legal tasks. The models failed most of the time, but Mercor's CEO, Brendan Foody, told Business Insider that the results tell only part of the story.
Following the release of GPT-5.2 last week, OpenAI has begun rolling out a new image generation model. The company says the updated ChatGPT Images is four times faster than its predecessor. If you're a frequent ChatGPT user, you'll know it can sometimes take a while for OpenAI's servers to create images, particularly during peak times and if you're not paying for ChatGPT Plus. In that respect, any improvement in speed is welcome.
Chinese AI firm DeepSeek has made yet another splash with the release of V3.2, the latest iteration in its V3 model series. Launched Monday, the model, which builds on an experimental V3.2 version announced in October, comes in two versions: "Thinking," and a more powerful "Speciale." DeepSeek said V3.2 pushes the capabilities of open-source AI even further. Like other DeepSeek models, it's a fraction of the cost of proprietary models, and the underlying weights can be accessed via Hugging Face.
Qwen3-Coder-480B-A35B delivers SOTA advancements in agentic coding and code tasks, matching or outperforming Claude Sonnet-4, GPT-4.1, and Kimi K2. The 480B model achieves a 61.8% on Aider Polygot and supports a 256K token context, extendable to 1M tokens.