#ai-inference

[ follow ]
Artificial intelligence
fromComputerworld
5 days ago

CES 2026: AI compute sees a shift from training to inference

AI spending is shifting from training-heavy investment to inference-heavy investment, with forecasts projecting roughly 80% of future AI spend on inference.
#nvidia
fromFortune
1 week ago
Startup companies

After Nvidia's Groq deal, these are the AI chip startups sitting pretty-and one aiming to disrupt | Fortune

fromBusiness Insider
2 weeks ago
Silicon Valley

In a new deal, Nvidia hires Groq's top engineering talent, including its founder, who built AI chips at Google

fromFortune
1 week ago
Startup companies

After Nvidia's Groq deal, these are the AI chip startups sitting pretty-and one aiming to disrupt | Fortune

fromBusiness Insider
2 weeks ago
Silicon Valley

In a new deal, Nvidia hires Groq's top engineering talent, including its founder, who built AI chips at Google

Artificial intelligence
fromFortune
2 weeks ago

Nvidia's Groq bet shows that the economics of AI chip-building are still unsettled | Fortune

Inference determines AI profitability, requiring specialized, low-latency hardware beyond GPUs to reduce costs and handle large-scale, real-time model serving.
Artificial intelligence
fromAxios
2 weeks ago

Nvidia deal shows why inference is AI's next battleground

Inference performance and cost-efficiency are critical bottlenecks for scaling and monetizing AI, and Groq's inference-focused chips aim to address that gap.
fromZDNET
1 month ago

Cloud-native computing is poised to explode, thanks to AI inference work

AI inference is the process by which a trained large language model (LLM) applies what it has learned to new data to make predictions, decisions, or classifications. In practical terms, the process goes like this. After a model is trained, say the new GPT 5.1, we use it during the inference phase, where it analyzes data (like a new image) and produces an output (identifying what's in the image) without being explicitly programmed for each fresh image. These inference workloads bridge the gap between LLMs and AI chatbots and agents.
Artificial intelligence
Artificial intelligence
fromIT Pro
2 months ago

What is a tensor processing unit (TPU)?

TPUs are Google-designed ASICs evolved to massive-scale AI accelerators, culminating in the Ironwood chip delivering exaflop-level inference performance and high memory bandwidth.
Tech industry
fromIT Pro
2 months ago

Cisco wants to take AI closer to the edge

Cisco introduced Cisco Unified Edge, a scalable, modular platform combining computing, networking, and storage to run real-time AI inference at the enterprise edge.
#qualcomm
fromTheregister
2 months ago

Qualcomm announces AI accelerators and racks they'll run in

a generational leap in efficiency and performance for AI inference workloads by delivering greater than 10x higher effective memory bandwidth and much lower power consumption.
Artificial intelligence
Artificial intelligence
from24/7 Wall St.
2 months ago

Oracle Executive Just Gave 50,000 Reasons to Buy AMD Stock Right Now

AMD rapidly became a meaningful AI GPU competitor, gaining 10–15% market share through MI300X performance, hyperscaler partnerships, and a roadmap toward more efficient inference.
Artificial intelligence
fromTechzine Global
3 months ago

Intel expands AI portfolio with Crescent Island GPU

Intel's Crescent Island GPU targets AI inference with 160GB LPDDR5X, emphasizing energy efficiency, cost-effectiveness, and air-cooled deployment, with first units due H2 2026.
Artificial intelligence
fromTelecompetitor
3 months ago

123NET Expands Southfield Data Center for AI and High-Density Deployments

123NET expanded Southfield DC1 with a 4 MW high-density GPU colocation, liquid/air cooling, and on-site DET-iX free peering for low-latency AI.
Artificial intelligence
fromFortune
3 months ago

Jensen Huang doesn't care about Sam Altman's AI hype fears: he thinks OpenAI will be the first "multi-trillion dollar hyperscale company" | Fortune

Relentless inference demand from accelerated AI computing will drive a generational shift away from general-purpose computing, positioning OpenAI to become a multitrillion-dollar hyperscale company.
fromSilicon Valley Journals
4 months ago

Baseten raises $150 million to power the future of AI inference

Baseten just pulled in a massive $150 million Series D, vaulting the AI infrastructure startup to a $2.15 billion valuation and cementing its place as one of the most important players in the race to scale inference - the behind-the-scenes compute that makes AI apps actually run. If the last generation of great tech companies was built on the cloud, the next wave is being built on inference. Every time you ask a chatbot a question, generate an image, or tap into an AI-powered workflow, inference is happening under the hood.
Venture
Artificial intelligence
fromFortune
4 months ago

Exclusive: Baseten, AI inference unicorn, raises $150 million at $2.15 billion valuation

Baseten provides inference infrastructure that enables companies to deploy, manage, and scale AI models while rapidly increasing revenue and valuation.
Artificial intelligence
fromInfoWorld
4 months ago

Evolving Kubernetes for generative AI inference

Kubernetes now includes native AI inference features including vLLM support, inference benchmarking, LLM-aware routing, inference gateway extensions, and accelerator scheduling.
#amd
fromTechzine Global
7 months ago
Artificial intelligence

AMD makes third acquisition in eight days as feeding frenzy continues

AMD's recent acquisitions aim to bolster its AI inferencing capabilities against Nvidia.
Untether AI's team is joining AMD, signaling a strategic consolidation within AI technology.
fromBusiness Insider
8 months ago
Artificial intelligence

AMD's CTO says AI inference will move out of data centers and increasingly to phones and laptops

AMD is positioning itself to capitalize on the shift to AI inference, targeting market segments traditionally dominated by Nvidia.
Artificial intelligence
fromInfoQ
7 months ago

Google Enhances LiteRT for Faster On-Device Inference

LiteRT simplifies on-device ML inference with enhanced GPU and NPU support for faster performance and lower power consumption.
fromTechzine Global
7 months ago

Red Hat lays foundation for AI inferencing: Server and llm-d project

AI inferencing is crucial for unlocking the full potential of artificial intelligence, as it enables models to apply learned knowledge to real-world situations.
Artificial intelligence
Artificial intelligence
fromIT Pro
8 months ago

'TPUs just work': Why Google Cloud is betting big on its custom chips

Google's seventh generation TPU, 'Ironwood', aims to lead in AI workload efficiency and cost-effectiveness.
TPUs were developed with a cohesive hardware-software synergy, enhancing their utility for AI applications.
[ Load more ]