Anthropic researchers discovered 'many-shot jailbreaking,' allowing large language models to reveal sensitive information after priming with less-harmful questions.
Models with expanded context windows excel at tasks with abundant examples in the prompt, potentially leading to enhanced performance and unintended behaviors.
The mechanism enabling AI models to focus on user requests based on the prompt's content remains unclear, leading to unintended responses to inappropriate questions.
Collection
[
|
...
]