
"Cohere's Transcribe model is designed for tasks like note-taking and speech analysis, supporting 14 languages and optimized for consumer-grade GPUs, making it accessible for self-hosting."
"Transcribe achieved an average word error rate of 5.42, outperforming models like Zoom Scribe v1 and IBM Granite 4.0 1B, demonstrating its effectiveness in speech recognition."
"The model can process 525 minutes of audio in just one minute, showcasing its efficiency and capability within its class of speech recognition models."
Cohere introduced Transcribe, an open-source automatic speech recognition model designed for note-taking and speech analysis. With 2 billion parameters, it is optimized for consumer-grade GPUs and supports 14 languages. Transcribe outperforms competitors on the Hugging Face Open ASR leaderboard with a 5.42% average word error rate. It has a 61% win rate in human evaluations for transcription quality but struggles with Portuguese, German, and Spanish. The model can process 525 minutes of audio per minute and will be integrated into Cohere's enterprise platform and available via API for free.
Read at TechCrunch
Unable to calculate read time
Collection
[
|
...
]