Safety in large language models is ensured through red-teaming with human testers creating prompts to avoid toxic responses.
Researchers at Improbable AI Lab and MIT developed a machine learning technique for red-teaming to generate diverse prompts automatically.