
Demand for computing to run AI models continues to rise, but two obstacles block revenue: obtaining suitable chips and deploying them in data centers. General Compute rents AI processing power focused on inference, when models generate responses rather than being trained. The right hardware is increasingly specialized for inference, since trained-model generation has different computational needs than training. Nvidia Groq and Cerebras illustrate the shift toward inference-focused chips. With capacity constraints at those providers, General Compute uses SambaNova inference chips. SambaNova’s architecture emphasizes flexibility and larger memory for storing context during inference, and claims higher performance than GPUs and other specialized chips. General Compute plans to use SN50 chips and targets higher token generation rates.
"The raging demand for computers to run AI models has only accelerated, but there are two major obstacles that anyone in the business needs to overcome: getting the right chips, and getting them into data centers where they can start generating revenue."
"The demand for GPUs has gone through the roof, but it's becoming conventional wisdom that they aren't the best-suited chips for running AI models once they have been trained. The phase of AI where a model is actively generating responses has different computational requirements than training, and a new class of chips is being designed specifically for it."
"With capacity strained at both those companies, the co-founders of General Compute, CEO Finn Puklowski and CTO Jason Goodison, found another option. They're turning to specialized chips built by SambaNova, an Intel-backed chipmaker focused on inference that has fallen a bit out of the Silicon Valley conversation."
"The architecture is more flexible and uses more memory to store context during inference calculations, and SambaNova claims that it outperforms not just GPUs but also other specialized chips built by the likes of Groq or Cerebras. Puklowski says the new chips will generate 600 to 700 tokens per second, versus about 250 tokens per second for GPUs."
#ai-inference #gpus-vs-specialized-chips #data-center-deployment #token-generation-performance #sambanova-chips
Read at TechCrunch
Unable to calculate read time
Collection
[
|
...
]