UK researchers found AI chatbots vulnerable to simple techniques that bypass safeguards against issuing illegal, toxic, or explicit responses.
Basic attacks like starting prompts with innocent phrases can circumvent safeguards on chatbots, allowing harmful outputs to occur effortlessly.
Developers emphasize internal testing to prevent harmful responses; however, vulnerability to harmful prompts still persists in AI language models.
UK's AI Safety Institute discovered that even newly released large language models were highly susceptible to eliciting harmful responses through specific text prompts.
Collection
[
|
...
]