
""The User Alignment Critic runs after the planning is complete to double-check each proposed action," Google said. "Its primary focus is task alignment: determining whether the proposed action serves the user's stated goal. If the action is misaligned, the Alignment Critic will veto it.""
""When an action is rejected, the Critic provides feedback to the planning model to re-formulate its plan, and the planner can return control to the user if there are repeated failures," Nathan Parker from the Chrome security team said."
"The component is designed to view only metadata about the proposed action and is prevented from accessing any untrustworthy web content, thereby ensuring that it is not poisoned through malicious prompts that may be included in a website."
Chrome implements layered defenses to reduce exploitation from indirect prompt injections and untrusted web content. A User Alignment Critic uses a second model to independently evaluate proposed agent actions after planning, focusing on task alignment and vetoing misaligned actions. The Critic views only metadata and cannot access untrustworthy web content, preventing poisoning via malicious prompts. When actions are rejected, the Critic feeds back to the planner, which can reformulate plans or return control to the user after repeated failures. Agent Origin Sets restrict agent access to data from relevant origins or user-shared sources to address site isolation bypass.
Read at The Hacker News
Unable to calculate read time
Collection
[
|
...
]