The introduction of SpeechSR model enables efficient speech super-resolution, upsampling from 16 kHz to 48 kHz while outperforming existing models on performance and inference speed.
Our SpeechSR model demonstrates significant improvements in speech super-resolution tasks through a simplified architecture, outperforming multi-task models by focusing solely on 16-48 kHz upsampling.
The performance preference test reveals that the upsampled speech via SpeechSR is preferred over the original speech, further confirming its effective capabilities in practical applications.
With a tremendously faster inference speed and a markedly smaller parameter size, SpeechSR stands out as a strong candidate for real-world speech resolution tasks.
Collection
[
|
...
]