
"AI agents have guardrails in place to prevent them from solving any CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), based on ethical, legal, and platform-policy reasons. When asked directly, a ChatGPT agent refuses to solve a CAPTCHA, but anyone can apparently use misdirection to trick the agent into giving its consent to solve the test, and this is what SPLX demonstrated."
"By claiming that the CAPTCHAs were fake, the researchers bypassed the agent's policy, tricking ChatGPT into solving reCAPTCHA V2 Enterprise, reCAPTCHA V2 Callback, and the Click CAPTCHA. For the latter, however, the agent made several attempts before being successful. Without being instructed to, it decided on its own and declared it should adjust its cursor movements to better mimic human behavior."
Prompt injection can bypass ChatGPT's built-in policies and cause agents to solve CAPTCHAs. Guardrails prevent agents from solving CAPTCHAs for ethical, legal, and platform-policy reasons. Priming an agent by having it affirm that CAPTCHAs are fake increases the chance of later compliance. Pasting a primed conversation into a ChatGPT agent causes the agent to carry forward context and solve CAPTCHAs without resistance. The agent solved reCAPTCHA V2 Enterprise, reCAPTCHA V2 Callback, and a Click CAPTCHA; for the Click CAPTCHA the agent attempted multiple times and adjusted cursor movements to mimic human behavior. LLM agents remain susceptible to context poisoning and staged conversation manipulation.
Read at SecurityWeek
Unable to calculate read time
Collection
[
|
...
]