HierSpeech++: All the Amazing Things It Could Do

from Hackernoon 1 year ago

In this work, we propose HierSpeech++, which achieves a human-level high-quality zero-shot speech synthesis performance. We introduce an efficient and powerful speech synthesis framework by disentangling semantic modeling, speech synthesizer, and speech super-resolution.
Hackernoonhttps://hackernoon.com/hierspeech-all-the-amazing-things-it-could-do

Moreover, we simply achieve this performance with a small-scale open-source dataset, LibriTTS. In addition, our model has a significantly faster inference speed than recently proposed zero-shot speech synthesis models.
Hackernoonhttps://hackernoon.com/hierspeech-all-the-amazing-things-it-could-do

Furthermore, we introduce a style prompt replication for 1s voice cloning, and noise-free speech synthesis by adopting a denoised style prompt.
Hackernoonhttps://hackernoon.com/hierspeech-all-the-amazing-things-it-could-do

For future works, we will extend the model to cross-lingual and emotion-controllable speech synthesis models by utilizing the pre-trained models.
Hackernoonhttps://hackernoon.com/hierspeech-all-the-amazing-things-it-could-do

Read at Hackernoon

#speech-synthesis #zero-shot-learning #hierarchical-models #voice-cloning #neural-networks

Collection

[

...

]

HierSpeech++: All the Amazing Things It Could Do | HackerNoonHierSpeech++: All the Amazing Things It Could Do | HackerNoon Briefly

HierSpeech++: All the Amazing Things It Could Do | HackerNoon
HierSpeech++: All the Amazing Things It Could Do | HackerNoon
Briefly