OpenAI's Deep Research has more fact-finding stamina than you, but it's still wrong half the time
Briefly

OpenAI's recent developments in generative AI include Deep Research technology leveraging web access to enhance performance in question-answering tasks. While it demonstrates improved efficiency and persistence compared to human efforts, the system still encounters failure rates close to 50%. The latest benchmark, BrowseComp, evaluates the browsing capabilities of AI agents across a variety of topics, emphasizing their potential superiority due to greater recall and multitasking abilities. However, the challenges of consistent accuracy remain a significant limitation for these models.
"Machine intelligence, on the other hand, has much more extensive recall and can operate tirelessly without getting distracted," write Wei and team.
The premise is that AI agents -- meaning, AI models that can browse 'thousands of web pages' -- could be much more resourceful than humans.
Read at ZDNET
[
|
]