Integrated Speech Language Models Face Critical Safety Vulnerabilities | HackerNoon
Briefly

This study explores the safety alignment of speech language models in a Spoken QA context. It assesses both in-house and public models under adversarial attacks, revealing vulnerabilities where attackers can jailbreak models using minimal perturbations. Remarkably, adversarial perturbations from one model can successfully compromise another. The findings highlight various levels of robustness among models and underscore the necessity for effective countermeasures, such as noise-flooding defenses, to ensure safety alignment. This pioneering research calls attention to the urgent need for thorough understanding and protection against potential threats as these technologies gain popularity.
To accurately determine the safety alignment of these models, we developed a comprehensive evaluation setup using a publicly available LLM.
We also showed the effectiveness of a noise-flooding defense in countering the attacks.
Adversarial perturbations generated using one model can jailbreak a different model with reasonable success.
This is the first study to investigate the potential safety vulnerability of integrated speech and language models.
Read at Hackernoon
[
|
]