Microsoft has shown off its latest research in text-to-speech AI with a model called VALL-E that can simulate someone's voice from just a three-second audio sample, Ars Technica has reported.The speech can not only match the timbre but also the emotional tone of the speaker, and even the acoustics of a room.
[
add
]
[
|
|
...
]