OpenAI says GPT-5 stacks up to humans in a wide range of jobs

"The test, GDPval, is an early attempt at understanding how close OpenAI's systems are to outperforming humans at economically valuable work - a key part of the company's founding mission to develop artificial general intelligence or AGI. OpenAI says its found that its GPT-5 model and Anthropic's Claude Opus 4.1 "are already approaching the quality of work produced by industry experts.""

"For example, one prompt asked investment bankers to create a competitor landscape for the last mile delivery industry, and compared them to AI-generated reports. OpenAI then averages an AI model's "win rate" against the human reports across all 44 occupations. For GPT-5-high, a souped up version of GPT-5 with extra computational power, the company says the AI model was ranked as better than or on par with industry experts 40.6% of the time."

"That's not to say that OpenAI's models are going to start replacing humans in their jobs immediately. Despite some CEOs' predictions that AI will take the jobs of humans in just a few years, OpenAI admits that GDPval today covers a very limited number of tasks people do in their real jobs. However, it is one of the latest ways the company is measuring AI's progress towards this milestone."

GDPval measures AI performance on economically valuable work across nine industries that contribute most to U.S. GDP, testing 44 occupations from software engineers to nurses and journalists. The benchmark instructs experienced professionals to compare AI-generated outputs with human-produced reports and select the better product, then averages an AI model's win rate across occupations. GPT-5-high achieved parity or superiority with industry experts 40.6% of the time. GPT-5 and Anthropic's Claude Opus 4.1 are approaching expert-quality work, but GDPval currently covers a limited set of real-job tasks and does not imply immediate widespread replacement of human workers.

#ai-benchmarking #economic-impact #gpt-5 #claude-opus-41

Read at TechCrunch

Unable to calculate read time

Collection

[

...

]

OpenAI says GPT-5 stacks up to humans in a wide range of jobs | TechCrunchOpenAI says GPT-5 stacks up to humans in a wide range of jobs | TechCrunch Briefly

OpenAI says GPT-5 stacks up to humans in a wide range of jobs | TechCrunch
OpenAI says GPT-5 stacks up to humans in a wide range of jobs | TechCrunch
Briefly