OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace

"ChatGPT maker OpenAI has released a new evaluation, dubbed GDPval, to measure how well its AIs perform on "economically valuable, real-world tasks across 44 occupations." "People often speculate about AI's broader impact on society, but the clearest way to understand its potential is by looking at what models are already capable of doing," the company wrote in an accompanying blog post. "Evaluations like GDPval help ground conversations about future AI improvements in evidence rather than guesswork, and can help us track model improvement over time," OpenAI added."

"In "early results," GDPval found that "today's best frontier models are already approaching the quality of work produced by industry experts" - a clear shot across the bow at critics who say the tech isn't up to the demands of the workplace. The 44 occupations where "AI could have the highest impact on real-world productivity" included a litany of professions including real estate sales agents, social workers, industrial engineers, software developers, lawyers, registered nurses, customer service representatives, pharmacists, private detectives, and financial advisors."

"The specific tasks, as laid out in a paper, range from creating a "competitor landscape for last mile delivery" for a financial analyst, assessing "skin lesion images" for a registered nurse, and designing a sales brochure for a real estate agent. Surprisingly, the company found that its competitor Anthropic's Claude Opus 4.1 was the "best performing model" after being graded by industry experts across 220 tasks, followed by GPT-"

OpenAI launched GDPval to evaluate AI performance on economically valuable tasks across 44 occupations and roughly 220 tasks. Industry experts graded model outputs to assess real-world productivity impact and task quality. Early results indicate frontier models are approaching the quality of work produced by industry experts, suggesting meaningful workplace capabilities. Evaluated occupations include roles from real estate and healthcare to engineering, law, and customer service. Assessed tasks range from competitor landscape analysis and medical image assessment to sales brochure design. The evaluation aims to provide evidence-based tracking of model improvement and commercial viability.

#ai-evaluation #gdpval #model-benchmarking #economic-impact

Read at Futurism

Unable to calculate read time

Collection

[

...

]

OpenAI Releases List of Work Tasks It Says ChatGPT Can Already ReplaceOpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace Briefly

OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace
OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace
Briefly