
ChatGPT can fail to distinguish its own generated content from attacker-controlled Markdown embedded in externally sourced web content. Hidden instructions in a page can become the payload when a user requests a summary, allowing attackers to influence the model’s output. Attackers can inject phishing URLs into responses or cause the model to display fake security alerts in its own style. The technique can also pivot from a victim’s browser to a mobile device by showing an inline QR code that leads to attacker-hosted content. This approach can bypass desktop URL defenses such as blocklists and password-manager domain checks. The risk increases as AI systems resemble browser or operating system environments.
"EXCLUSIVE ChatGPT can't tell its own generated content from attacker-controlled Markdown pulled from external sources, according to a researcher who found the prompt injection technique and reported it to OpenAI. This means that if a user asks the chatbot to summarize a web page that contains hidden instructions, the page can become the payload."
"An attacker could abuse this blind trust to inject phishing URLs into ChatGPT responses, or even trick the model into showing fake security alerts written in ChatGPT's own style, Permiso threat hunter Andi Ahmeti told The Register. In a report shared with us ahead of publication, Ahmeti also demonstrated how criminals could exploit this trust issue to pivot their attack from a victim's browser to their mobile device by displaying an inline QR code."
"The victim scans the QR code with their phone and is taken to content hosted in an attacker-controlled S3 bucket, and this allows the baddie to bypass every desktop URL defense, including blocklists and password-manager domain checks, Ahmeti warned. "AI systems increasingly render untrusted content directly inside browsers, which expands risk significantly," he told us. "The bigger issue is that AI products are starting to resemble browser or operating system environments, which creates a much larger security surface.""
"Ahmeti doesn't know if the flaw has been fixed. We don't either, because OpenAI did not respond to The Register's questions, including: Have you fixed this? Ahmeti disclosed the security issue - he calls it "ChatGPhish" - to OpenAI a couple of months back, submitting his initial vulnerability report via Bugcrowd's disclosure program on April 29 and then revising his report on May 1. "The initial submission was marked as not reproducible," he said. "We resubmitted with additional detail and it was marked as a duplicate.""
Read at theregister
Unable to calculate read time
Collection
[
|
...
]