Evaluating Multimodal Speech Models Across Diverse Audio Tasks | HackerNoon
Briefly

This article explores the implementation and evaluation of a model titled 'SpeechVerse' designed for tasks in automatic speech recognition (ASR), spoken language understanding (SLU), and paralinguistic speech processing (PSP). A variety of publicly available speech datasets were utilized, accompanying a structured evaluation approach. The model demonstrates a unique ability to generalize across different instructions, thereby refining performance on multiple classification tasks. Various training strategies, including multimodal fine-tuning and curriculum learning, bolster the model's effectiveness, revealing insights into speech analytics and the importance of a diverse prompt set for training.
We utilize a wide range of speech datasets for training, which includes automatic speech recognition, spoken language understanding, and paralinguistic processing tasks.
By employing a diverse set of prompts and tasks, we aim to enhance our model's generalization capabilities across unseen intent and slot classes.
Read at Hackernoon
[
|
]