Artificial intelligencefromTechzine Global3 days agoSafety mechanisms of AI models more fragile than expectedA single unlabeled training prompt can undermine safety alignment in large language models.
Artificial intelligencefromComputerworld2 months agoGet poetic in prompts and AI will break its guardrails25 frontier proprietary and open-weight models yielded high attack-success rates when prompted in verse, showing AI can break guardrails and reveal harmful instructions.