The "First AI Software Engineer" Is Bungling the Vast Majority of Tasks It's Asked to Do

from Futurism 2 months ago

Research conducted by Answer.AI has revealed that Cognition's Devin, marketed as the first AI software engineer, failed to perform effectively in software tasks. Over a month, researchers tested Devin on 20 distinct tasks, encountering 14 failures and only three successes, yielding a dismal 15% success rate. Furthermore, the AI struggled to predict task outcomes and often pursued impossible solutions, raising concerns about its reliability as a tool for developers. The findings underscore the disparity between marketing claims and actual performance of AI in complex engineering roles.

Out of 20 tasks we attempted, we saw 14 failures, three inconclusive results, and just three successes, leading to a meager success rate of just 15 percent.
Futurismhttps://futurism.com/first-ai-software-engineer-devin-bungling-tasks

More concerning was our inability to predict which tasks would succeed... The autonomous nature that seemed promising became a liability.
Futurismhttps://futurism.com/first-ai-software-engineer-devin-bungling-tasks

Read at Futurism

#ai-technology #software-engineering #cognition #performance-analysis

Collection

[

...

]

The "First AI Software Engineer" Is Bungling the Vast Majority of Tasks It's Asked to DoThe "First AI Software Engineer" Is Bungling the Vast Majority of Tasks It's Asked to Do Briefly

The "First AI Software Engineer" Is Bungling the Vast Majority of Tasks It's Asked to Do
The "First AI Software Engineer" Is Bungling the Vast Majority of Tasks It's Asked to Do
Briefly