Mistral has launched Voxtral, its first open family of audio models targeting businesses. Voxtral provides a usable speech intelligence option that balances affordability with functionality, claiming to be less than half the cost of existing closed solutions. The model can transcribe 30 minutes of audio and leverage its LLM backbone for understanding up to 40 minutes. Multi-language support includes several major languages. Two variants are available: Voxtral Small for large deployments and Voxtral Mini for local use, along with an optimized transcription-only version.
Mistral's Voxtral is an open audio model that provides usable speech intelligence in production, offering businesses an affordable alternative to closed systems.
Voxtral can transcribe up to 30 minutes of audio and understand 40 minutes, allowing for question-asking about the audio content.
Collection
[
|
...
]