Mozilla says 271 vulnerabilities found by Mythos have "almost no false positives"
Briefly

Mozilla says 271 vulnerabilities found by Mythos have "almost no false positives"
"Mozilla on Thursday provided a behind-the-scenes look into its use of Anthropic Mythos-an AI model for identifying software vulnerabilities-to ferret out 271 Firefox security flaws over two months. In a post, Mozilla engineers said the finally ready-for-prime-time breakthrough they achieved was primarily the result of two things: (1) improvement in the models themselves and (2) Mozilla's development of a custom " harness " that supported Mythos as it analyzed Firefox source code."
"The engineers said their earlier brushes with AI-assisted vulnerability detection were fraught with "unwanted slop." Typically, someone would prompt a model to analyze a block of code. The model would then produce plausible-reading bug reports, and often at unprecedented scales. Invariably, however, when human developers further investigated, they'd find a large percentage of the details had been hallucinated. The humans would then need to invest significant work handling the vulnerability reports the old-fashioned way."
"Mozilla's work with Mythos was different, Mozilla Distinguished Engineer Brian Grinstead said in an interview. The biggest differentiating factor was use of an agent harness, a piece of code that wraps around an LLM to guide it through a series of specific tasks. For such a harness to be useful, it requires significant resources to customize it to the project-specific semantics, tooling, and processes it will be used for."
Mozilla used Anthropic Mythos to identify 271 Firefox security flaws over two months. Engineers said the results came mainly from improvements to the model and from a custom harness that guided the model while analyzing Firefox source code. Earlier AI-assisted attempts produced “unwanted slop,” with bug reports that looked plausible but contained hallucinated details, requiring developers to spend significant effort validating and handling reports. Mozilla’s approach used an agent harness that wraps around a large language model and directs it through specific tasks. The harness required substantial customization to match project semantics, tooling, and processes.
Read at Ars Technica
Unable to calculate read time
[
|
]