
"With benchmark claims and Apache 2.0 licensing, it challenges Western rivals while raising fresh questions for enterprise adoption. Alibaba is taking direct aim at US tech giants with Qwen3-Omni, a new open-source AI model that processes text, images, audio, and video, and is freely available under the enterprise-friendly Apache 2.0 license. The release positions Alibaba as a potential alternative to OpenAI and Google by offering enterprises a no-cost way to deploy multimodal AI at scale."
""Qwen3-Omni adopts the Thinker-Talker architecture," Alibaba said in a blog post. "Thinker is tasked with text generation while Talker focuses on generating streaming speech tokens by receives [receiving] high-level representations directly from Thinker. To achieve ultra-low-latency streaming, Talker autoregressively predicts a multi-codebook sequence." According to the company, Qwen3-Omni performed on par with single-modal models in its Qwen series and showed stronger results in audio tasks."
Alibaba released Qwen3-Omni, an open-source multimodal AI model under the Apache 2.0 license that handles text, images, audio, and video. The model implements a Thinker-Talker architecture where Thinker generates text and Talker produces low-latency streaming speech tokens via autoregressive multi-codebook prediction. Alibaba reports parity with single-modal Qwen models and stronger audio-task performance, and claims the model ranked highest on 32 open-source and 22 overall benchmarks, outperforming several closed-source competitors. The permissive Apache 2.0 license enables in-house deployment, customization, and reduced vendor lock-in, strengthening open-source reach while raising enterprise adoption questions.
Read at Computerworld
Unable to calculate read time
Collection
[
|
...
]