RAG Predictive Coding for AI Alignment Against Prompt Injections and Jailbreaks | HackerNoon
Briefly

AI chatbots should develop an understanding of expected prompts, improving the effectiveness of their responses and reducing the chance of being manipulated through prompt injections and jailbreaks.
By implementing 'expectation' mechanisms in prompts, AI can better anticipate inputs that challenge safety, creating a framework for identifying high-risk interactions and strengthening overall alignment.
Jailbreaks and prompt injections present vulnerabilities for AI chatbots; establishing a structured expectation system could limit these risks and enhance their safety measures.
Current AI chatbot architectures lack differentiation in input combinations, leaving them exposed to unpredictable prompts. A focus on expectation could serve as a foundational step towards AI safety.
Read at Hackernoon
[
]
[
|
]