OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents

"The project appears to be part of OpenAI's efforts to establish a human baseline for different tasks that can then be compared with AI models. In September, the company launched a new evaluation process to measure the performance of its AI models against human professionals across a variety of industries. OpenAI says this is a key indicator of its progress towards achieving AGI, or an AI system that outperforms humans at most economically valuable tasks."

""We've hired folks across occupations to help collect real-world tasks modeled off those you've done in your full-time jobs, so we can measure how well AI models perform on those tasks," reads one confidential document from OpenAI. "Take existing pieces of long-term or complex work (hours or days+) that you've done in your occupation and turn each into a task.""

"OpenAI is asking contractors to describe tasks they've done in their current job or in the past and to upload real examples of work they did, according to an OpenAI presentation about the project viewed by WIRED. Each of the examples should be "a concrete output (not a summary of the file, but the actual file), e.g., Word doc, PDF, Powerpoint, Excel, image, repo," the presentation notes."

OpenAI is requesting third-party contractors to upload real assignments and task deliverables from current or past workplaces for use in evaluating next-generation AI models. The effort aims to create a human baseline across occupations to compare human performance with AI performance. A new evaluation process launched in September measures models against human professionals across various industries as an indicator of progress toward AGI. Contractors are asked to submit task requests and the actual task deliverables, including original files or fabricated realistic examples, and the project defines both the managerial request and the produced deliverable. OpenAI and Handshake AI declined to comment.

#openai #ai-model-evaluation #training-data-collection #agi-benchmarking

Read at WIRED

Unable to calculate read time

Collection

[

...

]

OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI AgentsOpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents Briefly

OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents
OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents
Briefly