
"On November 2, 1988, graduate student Robert Morris released a self-replicating program into the early Internet. Within 24 hours, the Morris worm had infected roughly 10 percent of all connected computers, crashing systems at Harvard, Stanford, NASA, and Lawrence Livermore National Laboratory. The worm exploited security flaws in Unix systems that administrators knew existed but had not bothered to patch. Morris did not intend to cause damage."
"He wanted to measure the size of the Internet. But a coding error caused the worm to replicate far faster than expected, and by the time he tried to send instructions for removing it, the network was too clogged to deliver the message. History may soon repeat itself with a novel new platform: networks of AI agents carrying out instructions from prompts and sharing them with other AI agents, which could spread the instructions further."
"Security researchers have already predicted the rise of this kind of self-replicating adversarial prompt among networks of AI agents. You might call it a "prompt worm" or a "prompt virus." They're self-replicating instructions that could spread through networks of communicating AI agents similar to how traditional worms spread through computer networks. But instead of exploiting operating system vulnerabilities, prompt worms exploit the agents' core function: following instructions."
"When an AI model follows adversarial directions that subvert its intended instructions, we call that "prompt injection," a term coined by AI researcher Simon Willison in 2022. But prompt worms are something different. They might not always be "tricks." Instead, they could be shared voluntarily, so to speak, among agents who are role-playing human-like reactions to prompts from other AI agents."
Self-replicating prompts, or "prompt worms", could propagate through networks of AI agents by being shared or voluntarily forwarded, exploiting agents' instruction-following behavior. Historical precedent exists: the 1988 Morris worm spread rapidly through unpatched Unix systems after a coding error, causing widespread outages. Prompt worms could spread without exploiting OS vulnerabilities, instead subverting models through adversarial or role-played instructions. Prompt injection is an existing class of attack where models follow adversarial directions. Networks of agents that exchange prompts increase contagion risk, because instructions can propagate among agents faster than humans can intervene.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]