We found a simple trick to transfer the style even with a one second speech prompt by introducing style prompt replication (SPR), which enhances short prompt synthesis.
The replicated prompt by n times is fed to the style encoder to extract the style representation, enabling synthesis from short prompts that typically create errors.
Using SPR, we can deceive the style encoder, making short prompts appear longer, and thus generate high-fidelity synthesized speech effectively.
#speech-synthesis #voice-conversion #style-prompt-replication #artificial-intelligence #neural-models
Collection
[
|
...
]