
"Even the best AI coding models succeeded less than 23% of the time when working on real production code. Most models scored above 85% on popular benchmarks, but averaged just 17% success on production maintainability tasks."
"AI coding ROI varied dramatically by language and task. Success rates ranged from 32% in JavaScript to just 4% in C, and dropped as low as 1.5% on complex architectural tasks."
"Dropping AI into an operation will not deliver results without work behind the scenes, including on maintainability. To count as successful, AI-generated code needed to meet strict criteria."
AI coding models are underperforming, achieving less than 23% success in real-world applications despite high benchmark scores. A study evaluated 57 LLMs on maintainability tasks, revealing significant discrepancies between benchmark performance and actual coding success. Success rates varied by programming language, with JavaScript achieving 32% and C dropping to 4%. The findings indicate that simply implementing AI does not guarantee results; substantial effort is required to ensure maintainability and effectiveness in production environments.
Read at ZDNET
Unable to calculate read time
Collection
[
|
...
]