Datasets and Evaluation Define the Robustness of Speech Language Models | HackerNoon
Briefly

The article details the training and evaluation processes for speech-language models (SLMs) focusing on adversarial robustness. It utilizes a 2.5K hour ASR speech-text parallel corpus, complemented by constructing 160K speech-text pairs for the Spoken QA task using text-to-speech systems. The evaluation includes 390 harmful questions across 13 categories, targeting the models' abilities to handle adversarial inputs. Insights on trials, countermeasures, and limitations are shared, indicating avenues for improving model resilience and ethical considerations surrounding AI responses to harmful inquiries.
We utilize a training dataset of 2.5K hours of ASR speech-text pairs, and create an additional 160K pairs from TTS datasets for the Spoken QA task.
Our evaluation focuses on the adversarial robustness of SLMs, using 390 harmful questions, categorized into 13 types to assess responses.
Read at Hackernoon
[
|
]