
"Nvidia has published new benchmark results showing that its latest AI server platform, the GB200 NVL72, significantly improves the performance of modern mixture-of-experts (MoE) models. According to the company, recent models, including Moonshot AI's Kimi K2 Thinking and DeepSeek's models, run up to 10 times faster than on previous-generation systems. Mixture-of-experts models assume that not all parts of a large language model need to be deployed at once. A prompt is divided into sub-questions that are processed by specialized sub-models, the experts."
"The approach gained widespread attention after DeepSeek demonstrated in early 2025 that an efficiently designed MoE model could compete with models that required much more GPU time. Since then, OpenAI, Mistral AI, Moonshot AI, and others have incorporated the architecture into their latest-generation models. Nvidia attributes the performance gains of the NVL72 to the system's scalability, with 72 GPUs linked within a single node, and to improved NVLink connections between those chips."
Nvidia's GB200 NVL72 server platform significantly increases performance for modern mixture-of-experts (MoE) models, with some workloads running up to ten times faster than on previous-generation systems. MoE models activate only the most relevant specialized sub-models, reducing compute costs while expanding capacity by routing sub-questions to selected experts. The NVL72's gains stem from 72 GPUs linked within a single node and improved NVLink connections that enable more efficient routing between active experts and better parallel execution. Recent MoE designs demonstrated competitive efficiency, prompting firms like OpenAI, Mistral AI, Moonshot AI, and DeepSeek to adopt the architecture. Models from China appear prominently in hardware testing.
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]