Inference is giving AI chip startups a 2nd chance to shine

"Nvidia's $20 billion acquihire of Groq back in December is a prime example. The startup's SRAM-heavy chip architecture meant that, with enough of them, Groq's LPUs could churn out tokens faster than any GPU."

"Nvidia side stepped this problem by moving the compute heavy prefill bit of the inference pipeline to its GPUs while it kept the bandwidth-constrained decode operations on its shiny new LPUs."

"So far, most of the AI chip startups' wins have been on the decode side of the equation. SRAM, while not particularly capacious, is stupendously fast."

"This week, Lumai detailed its optical inference accelerator, which uses light, rather than traditional methods, to enhance performance in inference tasks."

AI adoption is transitioning from training new models to serving them, particularly in inference, which presents diverse workload opportunities for chip startups. Inference requires different compute, memory, and bandwidth configurations compared to training. Nvidia's acquisition of Groq exemplifies this shift, as it combines GPUs for compute-heavy tasks with specialized LPUs for bandwidth-constrained operations. Other companies like AWS and Intel are also developing similar disaggregated compute platforms, indicating a broader trend in the industry towards optimizing inference processes.

#ai-adoption #inference #chip-startups #nvidia #technology-trends

Read at Theregister

Unable to calculate read time

Collection

[

...

]

Inference is giving AI chip startups a 2nd chance to shineInference is giving AI chip startups a 2nd chance to shine Briefly

Inference is giving AI chip startups a 2nd chance to shine
Inference is giving AI chip startups a 2nd chance to shine
Briefly