
"Commercial AI models and most open source models include some form of safety check or alignment process that mean they refuse to comply with unlawful or harmful requests. AutoGuard's authors designed their software to craft defensive prompts that stop AI agents in their tracks by triggering these built-in refusal mechanisms. AI agents consist of an AI component - one or more AI models - and software tools like Selenium, BeautifulSoup4, and Requests that the model can use to automate web browsing and information gathering."
"LLMs rely on two primary sets of instructions: system instructions that define in natural language how the model should behave, and user input. Because AI models cannot easily distinguish between the two, it's possible to make the model interpret user input as a system directive that overrides other system directives. Such overrides are called "direct prompt injection" and involve submitting a prompt to a model that asks it to "Ignore previous instructions.""
Researchers in South Korea developed AutoGuard, an agent defense that crafts indirect prompt injections to deter malicious AI agents from scraping data. The system exploits LLMs' safety checks by generating defensive prompts that trigger built-in refusal mechanisms, causing models to refuse unlawful or harmful requests. AutoGuard focuses on AI agents combining models with automation tools such as Selenium, BeautifulSoup4, and Requests. The approach contrasts with network-based defenses that block crawlers by IP addresses, request headers, or behavioral characteristics. AutoGuard leverages models' difficulty distinguishing system instructions from user input to induce compliance with safety policies. The software is described in a preprint under review for ICLR 2026.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]