Alibaba just admitted it's struggling to keep up with rival chipmakers and AI shops
Briefly

Alibaba just admitted it's struggling to keep up with rival chipmakers and AI shops
Alibaba revealed the Zhenwu M890 accelerator and the Panjiu AL128 rack-scale server system. The M890 includes 144GB of on-chip memory, provides 800 GB per second of inter-chip bandwidth, and supports precision formats from FP32 down to FP4. Alibaba stated the M890 delivers three times the performance of the Zhenwu 810E but provided no further performance metrics. Production has reached only 560,000 chips to date, with no disclosed production volume for the M890. The Panjiu AL128 Supernode Server packs 128 AI accelerators into a single rack-scale unit and targets petabyte-per-second internal bandwidth for unpredictable, high-frequency inference bursts generated by agents.
"The new chip is called the Zhenwu M890, and comes from Alibaba's semiconductor design business T-Head. Neither company has said much about it other than stating it includes 144GB of on-chip memory, possesses "800 GB per second of inter-chip bandwidth" and natively supports precision formats from FP32 down to FP4. The Chinese giant didn't offer any info about performance other than to say it delivers "three times the performance of its predecessor, Zhenwu 810E.""
"That means the most interesting figure in Alibaba's announcement is 560,000 - the number of Zhenwu chips Alibaba says T-Head has made to date. By way of contrast, Nvidia says AWS alone will rack and stack one million of its GPUs this year. AWS's spending on AI infrastructure is at similar levels to Microsoft, Meta, and Google, so it's conceivable that Nvidia will make and sell three or four million GPUs to satisfy those four customers alone."
"The company did talk up the machines the M890 will run inside - a new beast called the Panjiu AL128 Supernode Server Alibaba described as "a rack-scale system that packs 128 AI accelerators into a single unit and delivers petabyte-per second internal bandwidth ... designed specifically for the concurrency patterns that agents generate: unpredictable, high-frequency bursts of inference requests that overwhelm conventional compute clusters.""
Read at theregister
Unable to calculate read time
[
|
]