A study from Carnegie Mellon University tested AI agents in a simulated software company setting, revealing their performance deficiencies in handling typical tasks. The project, involving AI from major firms like Google and OpenAI, showed the best model, Claude 3.5 Sonnet, completed only 24 percent of tasks, while others fared worse. Researchers pointed out that AI's struggles stem from a lack of common sense, poor social interaction abilities, and inefficiencies in task execution, emphasizing that these challenges keep AI from usurping human jobs soon.
The best-performing model was Anthropic's Claude 3.5 Sonnet, which struggled to finish just 24 percent of the jobs assigned to it, showcasing the current limitations of AI.
Researchers noted that agents are plagued with a lack of common sense, weak social skills, and a poor understanding of how to navigate the internet.
Collection
[
|
...
]