
"Nvidia's $20 billion acquihire of Groq back in December is a prime example. The startup's SRAM-heavy chip architecture meant that, with enough of them, Groq's LPUs could churn out tokens faster than any GPU."
"Nvidia side stepped this problem by moving the compute heavy prefill bit of the inference pipeline to its GPUs while it kept the bandwidth-constrained decode operations on its shiny new LPUs."
"So far, most of the AI chip startups' wins have been on the decode side of the equation. SRAM, while not particularly capacious, is stupendously fast."
"This week, Lumai detailed its optical inference accelerator, which uses light, rather than traditional methods, to enhance performance in inference tasks."
AI adoption is transitioning from training new models to serving them, particularly in inference, which presents diverse workload opportunities for chip startups. Inference requires different compute, memory, and bandwidth configurations compared to training. Nvidia's acquisition of Groq exemplifies this shift, as it combines GPUs for compute-heavy tasks with specialized LPUs for bandwidth-constrained operations. Other companies like AWS and Intel are also developing similar disaggregated compute platforms, indicating a broader trend in the industry towards optimizing inference processes.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]