Output from vibe coding tools prone to critical security flaws, study finds
Briefly

Output from vibe coding tools prone to critical security flaws, study finds
"Popular vibe coding platforms consistently generate insecure code in response to common programming prompts, including creating vulnerabilities rated as 'critical,' new testing has found. Security startup Tenzai's top-line conclusion: the tools are good at avoiding security flaws that can be solved in a generic way, but struggle where what distinguishes safe from dangerous depends on context. The assessment, which it conducted in December 2025, compared five of the best-known vibe coding tools - Claude Code, OpenAI Codex, Cursor, Replit, and Devin - by using pre-defined prompts to build the same three test applications."
"[Code generated by AI] agents seems to be very prone to business logic vulnerabilities. While human developers bring intuitive understanding that helps them grasp how workflows should operate, agents lack this 'common sense' and depend mainly on explicit instructions," said Tenzai's researchers. Offsetting this, the tools did a good job of avoiding common flaws that have long plagued human-coded applications, such as SQLi or XSS vulnerabilities that are both still prominently featured in the OWASP Top 10 list of web application security risks."
An assessment in December 2025 compared five popular vibe coding tools — Claude Code, OpenAI Codex, Cursor, Replit, and Devin — using pre-defined prompts to build three test applications each. Across the 15 generated applications, the tools produced 69 vulnerabilities: about 45 low-to-medium, several high, and roughly six critical. Critical flaws appeared only in outputs from Claude Code (4), Devin (1), and Codex (1). The most serious issues involved API authorization logic and business-logic vulnerabilities affecting e-commerce workflows. The tools avoided common flaws like SQL injection and XSS but struggled where safety depended on contextual understanding and intuitive workflow reasoning.
Read at InfoWorld
Unable to calculate read time
[
|
]