The article discusses advancements in voice cloning technologies relevant to audio book services, particularly tools such as CosyVoice, F5-TTS, and GPT-SoVITS. The GPT-SoVITS project has seen significant quality improvements from version 2 to version 4. The installation process on Linux involves creating a Python environment, installing essential dependencies like FFmpeg, and may include automated scripts for ease. A new version, 20250606v2pro, has been released and may have notable differences from the previous version used.
The model's quality improved greatly from v2 to v4. Although when doing long text TTS, errors are unavoidable, it is good enough for my use case.
By the time of writing, they released a new version 20250606v2pro which may have some differences since I was using version 20250422v4.
The installation process includes setting up a new Python environment with conda and installing various dependencies like FFmpeg and libsox-dev.
The GPT-SoVITS project provides a user-friendly one-click experience through their Windows package, despite being deployable on Linux servers.
Collection
[
|
...
]