As Google eyes exponential surge in serving capacity, analyst says we're entering 'stage two of AI' where bottlenecks are physical constraints | Fortune
Briefly

As Google eyes exponential surge in serving capacity, analyst says we're entering 'stage two of AI' where bottlenecks are physical constraints | Fortune
"Google's AI infrastructure boss warned the company needs to scale up its tech to accommodate a massive influx of users and complex requests being handled by AI products-and it may be a sign that fears of a bubble are overblown. Amin Vahdat, a VP who leads the global AI and infrastructure team at Google, said during a presentation at a Nov. 6 all-hands meeting that the company needs to double its serving capacity every six months, with "the next 1000x in 4-5 years," CNBC reported. This refers to Google's ability to ensure that Gemini and other AI products depending on Google Cloud can still work well when queried by a skyrocketing number of users. That's different from compute, or the physical infrastructure involved in training AI."
"A Google spokesperson told Fortune that "demand for AI services means we are being asked to provide significantly more computing capacity, which we are driving through efficiency across hardware, software, and model optimizations, in addition to new investments," pointing to the company's Ironwood chips as an example of its own hardware driving improvements in computing capacity."
"In previous years, every hyperscaler-think Google Cloud but also Amazon and Microsoft Azure-rushed to increase compute in anticipation of an influx of AI users. Now, the users are here, said Shay Boloor, chief market strategist at Futurum Equities. But as each company ratchets up its AI offerings, serving capacity is emerging as the next major challenge to tackle. "We're entering the stage two of AI where serving capacity matters even more than the compute capacity, because the compute creates the model, but serving capacity determines how widely and how quickly that model can actually reach the users," he told Fortune. Google, with its vast capital expenditures and past strategic moves to develop its own AI chips, is likely capable of doubling its serving capacity every six months, said Boloor."
Google needs to rapidly expand serving capacity to handle a massive influx of AI users and increasingly complex requests, with a goal of roughly 1000x growth in 4–5 years. Serving capacity refers to the ability to deliver models like Gemini to users and is distinct from training compute. Companies are optimizing hardware, software, and models and investing in custom chips such as Ironwood to increase effective capacity. Hyperscalers previously focused on training compute, but the current stage emphasizes serving scale to reach users. Google’s capital expenditures and chip efforts position it to scale serving capacity quickly, while competitors also face this challenge.
Read at Fortune
Unable to calculate read time
[
|
]