Cerebras gives waferscale chips an inferencing twist
Briefly

Cerebras Systems' new WSE-3 accelerator, equipped with 44GB of SRAM, delivers high inference performance, achieving 1,800 tokens/second compared to Nvidia's H100.
Andrew Feldman states that fast processing allows for building applications around multiple models without noticeable latency, likening current generative AI usage to the dial-up internet era.
Cerebras claims high-bandwidth SRAM enables significant speed improvements for generative AI, allowing large language models to iteratively refine outputs instead of offering single responses.
By emphasizing bandwidth capabilities, Cerebras’ technology could shift generative AI's operational speed from a slow dial-up experience to a much faster broadband-like model.
Read at Theregister
[
|
]