Transformers.js: ML for the Web, Now with Text-to-Speech
Briefly

Transformers.js, the JavaScript counterpart to the Python Transformers library, is designed for running Transformers models directly within web browsers, eliminating the necessity for external server processing. In the recent update to version 2.7, Transformers.js introduced enhancements, including notable text-to-speech (TTS) support. This upgrade, responding to user demand, increased the library's versatility for additional use cases.
Text-to-speech (TTS) involves creating natural-sounding speech from text, supporting multiple spoken languages and speakers. Currently, Transformers.js only supports TTS with Xenova/speecht5_tts, which is based on Microsoft's SpeechT5 with ONNX weights. There are plans for future updates, including adding support for bark and MMS. Developers can use the text-to-speech functionality by employing the pipeline function from @xenova/transformers. This involves specifying the 'text-to-speech' task and the model ('Xenova/speecht5_tts') to be used, with the option { quantized: false }. Additionally, a link to a file containing speaker embeddings is provided. Once the TTS model is applied to a given text, the output includes an audio array and the sampling rate. This array represents the synthesized speech, which can be further processed or played directly in the browser.
Transformers.js caters to various use cases, including style transfer, image inpainting, image colorization, and super-resolution. Its versatility and regular updates position it as a valuable asset for developers exploring the intersection of machine learning and web development, making it a reliable tool in the realm of web-based machine learning.
Read at InfoQ
[
]
[
|
]