Experiments on SEAMLESSEXPRESSIVELM Reveal Its Superiority in Efficiency and Translation Quality | HackerNoon
Briefly

The article discusses the empirical evaluation of speech-to-speech translation with a focus on maintaining speaker style. Utilizing an in-house dataset consisting of 250k Spanish-English and 300k Hungarian-English speech pairs, the study examines translation quality through both automatic and subjective metrics, including ASRBLEU and vocal style similarity. The latter is measured using cosine similarity of embeddings from a pre-trained WavLM encoder. Subjective quality is assessed via Mean Opinion Score from two annotators on model outputs. The findings underscore the importance of preserving vocal characteristics while translating speeches in different languages.
In this study, we focus on speech-to-speech translation with an emphasis on preserving speaker style, utilizing two language pairs: Spanish-English and Hungarian-English.
We employed an in-house dataset of 250k Spanish-English and 300k Hungarian-English speech pairs and used various metrics to evaluate translation and style transfer.
To assess audios, we used ASRBLEU for semantic quality and introduced vocal style similarity metrics, crucial for evaluating the effectiveness of style transfer.
Subjective evaluations of speech quality were conducted using Mean Opinion Score, allowing us to gauge listener satisfaction with the generated translations.
Read at Hackernoon
[
|
]