
"On Wednesday, the Paris-based AI lab released two new speech-to-text models: Voxtral Mini Transcribe V2 and Voxtral Realtime. The former is built to transcribe audio files in large batches and the latter for nearly real-time transcription, within 200 milliseconds; both can translate between 13 languages. Voxtral Realtime is freely available under an open source license."
"According to Mistral, the new models are both cheaper to run and less error-prone than competing alternatives. Mistral has pitched Voxtral Realtime—though the model outputs text, not speech—as a marked step towards free-flowing conversation across the language barrier, a problem Apple and Google are also competing to solve. "What we are building is a system to be able to seamlessly translate. This model is basically laying the groundwork for that," claims Pierre Stock. "I think this problem will be solved in 2026.""
Mistral AI released two new speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for nearly real-time transcription within 200 milliseconds, both supporting 13 languages. Voxtral Realtime is available under an open-source license and outputs text rather than synthesized speech. At four billion parameters, the models are small enough to run locally on phones or laptops, reducing reliance on cloud services and improving privacy. Mistral says the models are cheaper to run and less error-prone than competitors and positions Voxtral Realtime as groundwork toward seamless multilingual conversation, with broader industry competition from Apple, Google, and others. Mistral, founded in 2023 by Meta and DeepMind alumni, emphasizes model design and dataset optimization to maximize performance with limited compute and funding.
Read at WIRED
Unable to calculate read time
Collection
[
|
...
]