"The company plans to release a new inference-optimized chip, the M100, next year. It was developed by its Kunlunxin chip unit to better serve the next generation of mixture-of-experts (MoE) models. As we've recently explored, MoE architectures present specific challenges for inference at scale, particularly as models grow beyond a single accelerator or server. In many cases, interconnect bandwidth and latency become a bottleneck, inhibiting performance."
"It appears Baidu aims to sidestep this particular issue by building larger compute domains, similar to what AMD and Nvidia are doing with their own rack-scale architectures. Baidu plans to offer the chips in a clustered configuration called the Tianchi256 beginning in early 2026. As the name suggests, the configuration will feature 256 M100 accelerators. Baidu will reportedly expand the system to an even larger compute domain with the launch of the Tianchi512 in late 2026, which will double the system's inference capacity."
Baidu introduced new AI accelerators and clustered systems to reduce reliance on Western chips and cut inference costs. The M100 is an inference-optimized accelerator from Kunlunxin designed to serve next-generation mixture-of-experts (MoE) models. MoE inference faces interconnect bandwidth and latency bottlenecks when models span multiple accelerators or servers. Baidu will offer M100s in a Tianchi256 cluster of 256 accelerators in early 2026 and expand to a Tianchi512 in late 2026 to double inference capacity. A training-focused M300 aimed at multi-trillion-parameter models will debut in 2027. Baidu also released ERNIE 5.0, a multimodal foundation model supporting text, images, audio, and video.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]