
"MAI-Transcribe-1 delivers enterprise-grade accuracy across 25 languages at approximately 50 percent lower GPU cost than leading alternatives, making it a competitive option for businesses."
"MAI-Voice-1 can produce 60 seconds of audio in less than a second on a single GPU, showcasing its efficiency and potential for rapid speech generation."
"These models are well-suited for common enterprise use cases, such as designing customer support agents that can recognize speech and generate a response."
"Microsoft's models are available exclusively on Foundry for developers to use, already powering products like Copilot, Bing, and Azure Speech."
Microsoft unveiled public preview versions of three machine learning models: MAI-Transcribe-1 for speech recognition, MAI-Voice-1 for speech synthesis, and MAI-Image-2 for image generation. These models offer enterprise-grade accuracy and efficiency, with MAI-Transcribe-1 providing lower GPU costs and MAI-Voice-1 generating audio rapidly. Available through Foundry, these models are designed for various enterprise applications, including customer support and media subtitling. Microsoft is already utilizing these models in its products like Copilot and Azure Speech, showcasing their practical applications in real-world scenarios.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]