In our experiments, we focused on zero-shot TTS performance, comparing our model with Vall-E, NaturalSpeech 2, and StyleTTS 2, revealing superior naturalness scores.
Our model demonstrated significantly higher similarity between prompts and generated speech, although it exhibited lower similarity with ground truth samples, highlighting a distinct style.
Collection
[
|
...
]