New Alibaba model Qwen3-Omni heightens competition in multimodal AI
Briefly

New Alibaba model Qwen3-Omni heightens competition in multimodal AI
"With benchmark claims and Apache 2.0 licensing, it challenges Western rivals while raising fresh questions for enterprise adoption. Alibaba is taking direct aim at US tech giants with Qwen3-Omni, a new open-source AI model that processes text, images, audio, and video, and is freely available under the enterprise-friendly Apache 2.0 license. The release positions Alibaba as a potential alternative to OpenAI and Google by offering enterprises a no-cost way to deploy multimodal AI at scale."
""Qwen3-Omni adopts the Thinker-Talker architecture," Alibaba said in a blog post. "Thinker is tasked with text generation while Talker focuses on generating streaming speech tokens by receives [receiving] high-level representations directly from Thinker. To achieve ultra-low-latency streaming, Talker autoregressively predicts a multi-codebook sequence." According to the company, Qwen3-Omni performed on par with single-modal models in its Qwen series and showed stronger results in audio tasks."
Alibaba released Qwen3-Omni, an open-source multimodal AI model under the Apache 2.0 license that handles text, images, audio, and video. The model implements a Thinker-Talker architecture where Thinker generates text and Talker produces low-latency streaming speech tokens via autoregressive multi-codebook prediction. Alibaba reports parity with single-modal Qwen models and stronger audio-task performance, and claims the model ranked highest on 32 open-source and 22 overall benchmarks, outperforming several closed-source competitors. The permissive Apache 2.0 license enables in-house deployment, customization, and reduced vendor lock-in, strengthening open-source reach while raising enterprise adoption questions.
Read at Computerworld
Unable to calculate read time
[
|
]