Zero-shot Text-to-Speech: How Does the Performance of HierSpeech++ Fare With Other Baselines? | HackerNoon
Briefly

In our evaluation, HierSpeech++ consistently exhibited superior performance across various subjective and objective metrics for zero-shot text-to-speech synthesis, surpassing even ground truth in terms of naturalness.
While HierSpeech++ showed remarkable results in naturalness and overall performance, XTTS held an edge in pMOS, indicating that further refinement in our model could enhance future outcomes.
Read at Hackernoon
[
|
]