OpenAI study suggests AI may be about to eclipse human expertise in real-world tasks | Fortune
Briefly

OpenAI study suggests AI may be about to eclipse human expertise in real-world tasks | Fortune
"Rarely does a 29-page scholarly paper merit the attention of top-level executives, but every business leader should be familiar with a recent study from OpenAI. It's the best description yet of how AI can handle real-world tasks, showing which AI models are excelling, and hinting at what it all means for humans in the years ahead. The paper can be heavy going, but you can get a masterful summary from our AI Editor, Jeremy Kahn."
"The study is highly realistic. It examined 44 occupations and 1,320 specialized tasks required by those occupations. For example: the final testing step in manufacturing a cable spooling truck for underground mining operations. Appropriate professionals (average experience: 14 years) vetted the tasks, all of which are elements of actual work deliverables. Previous research has almost always focused on less realistic tests."
"The best models are already nearly as good as human industry experts. The study examined seven AI models from Open AI, Google's Gemini, xAI's Grok, and Anthropic's Claude. The clear winner was Claude Opus 4.1, which came within a few percentage points of reaching parity with human industry experts. The best models also completed tasks about 100 times faster and 100 times cheaper than the industry experts, though the comparisons ignore "the human oversight, iteration, and integration steps required in real workplace settings," OpenAI says."
OpenAI evaluated AI performance across 44 occupations and 1,320 specialized tasks using professionals with an average of 14 years' experience to vet real work deliverables and blind human graders to assess outputs. Seven models from OpenAI, Google, xAI, and Anthropic were tested, with Claude Opus 4.1 coming within a few percentage points of human industry experts. Top models completed tasks about 100 times faster and 100 times cheaper, though those comparisons exclude the human oversight, iteration, and integration needed in workplace settings. Rapid model improvement implies significant near-term impacts on jobs, costs, and operations; markets also reacted to geopolitical and export developments.
Read at Fortune
Unable to calculate read time
[
|
]