#gpu-inference

[ follow ]
DevOps
fromTheregister
2 days ago

Datadog digs down into GPU efficiency as AI costs soar

Datadog introduces GPU monitoring to enhance visibility and cost management for AI-driven organizations.
#intel
fromInfoWorld
2 days ago

How I doubled my GPU efficiency without buying a single new card

During prompt processing, the H100s were running at 92% compute utilization. Tensor cores fully saturated. Exactly what you want to see on a $30K GPU.
Business intelligence
#deepseek
Data science
fromTheregister
1 day ago

DeepSeek's new models offer big inference cost savings

DeepSeek V4 introduces a new large language model that rivals top American models while reducing inference costs and supporting Huawei's AI accelerators.
European startups
fromFortune
1 day ago

DeepSeek unveils its newest model at rock-bottom prices and with 'full support' from Huawei chips | Fortune

DeepSeek released its V4 model, claiming performance rivaling top closed-source models, impacting market shares of competitors and related companies.
Artificial intelligence
fromTechCrunch
1 day ago

DeepSeek previews new AI model that 'closes the gap' with frontier models | TechCrunch

DeepSeek launched V4 models, featuring 1 million token context windows and significant parameter counts, outperforming many peers in reasoning benchmarks.
Data science
fromTheregister
1 day ago

DeepSeek's new models offer big inference cost savings

DeepSeek V4 introduces a new large language model that rivals top American models while reducing inference costs and supporting Huawei's AI accelerators.
European startups
fromFortune
1 day ago

DeepSeek unveils its newest model at rock-bottom prices and with 'full support' from Huawei chips | Fortune

DeepSeek released its V4 model, claiming performance rivaling top closed-source models, impacting market shares of competitors and related companies.
Artificial intelligence
fromTechCrunch
1 day ago

DeepSeek previews new AI model that 'closes the gap' with frontier models | TechCrunch

DeepSeek launched V4 models, featuring 1 million token context windows and significant parameter counts, outperforming many peers in reasoning benchmarks.
#nvidia
Tech industry
from24/7 Wall St.
1 day ago

Why Isn't NVIDIA Stock at $300 While Other Semiconductor Stocks Rally?

NVIDIA shares lag behind peers despite strong AI market growth, with a 7% year-to-date increase compared to significant gains from competitors.
Software development
fromArs Technica
3 weeks ago

Nvidia rolls out its fix for PC gaming's "compiling shaders" wait times

Nvidia's new Auto Shader Compilation feature allows automatic shader compilation during idle times to reduce load times for PC gamers.
Video games
fromGadgets 360
3 weeks ago

Nvidia Brings New AI Features With a New DLSS 4.5 Update

Nvidia's DLSS 4.5 update introduces 6X multi-frame generation and dynamic multi-frame generation for enhanced gaming performance.
Vue
fromThe Verge
3 weeks ago

Nvidia rolls out DLSS 4.5 update with new frame generation features

Nvidia's DLSS 4.5 update introduces AI-powered frame generation for RTX GPUs, enhancing performance and image quality in over 20 games.
Artificial intelligence
fromnews.bitcoin.com
6 days ago

Nvidia Releases Nemotron 3 Super, a 120B Open AI Model Built for Agentic Workloads

Nvidia launched Nemotron 3 Super, a 120 billion parameter model that significantly reduces AI compute costs and increases throughput.
Business
from24/7 Wall St.
1 month ago

Nvidia Could Hit $340 by 2031 and the AI Buildout Is Just Getting Started

NVIDIA's stock is projected to reach $209.50 in one year and $298.29 in five years, driven by strong growth and strategic partnerships.
Science
fromTechCrunch
2 days ago

AI galaxy hunters are adding to the global GPU crunch | TechCrunch

NASA will launch the Nancy Grace Roman space telescope in September 2026, providing 20,000 terabytes of data to astronomers.
#ai
fromEntrepreneur
3 days ago
Careers

Nvidia CEO Jensen Huang Says AI Won't Replace You - It Will Just Be a Really Annoying Micromanager

Silicon Valley
fromTechCrunch
1 month ago

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Gimlet Labs raised $80 million to enhance AI inference efficiency across diverse hardware types.
Artificial intelligence
from24/7 Wall St.
3 days ago

Nvidia's CEO Sees Software Instances "Skyrocketing" - A Shift Investors Can't Ignore

Jensen Huang believes software will thrive despite the rise of AI agents, contradicting market fears of a software decline.
Careers
fromEntrepreneur
3 days ago

Nvidia CEO Jensen Huang Says AI Won't Replace You - It Will Just Be a Really Annoying Micromanager

AI will not eliminate jobs but will act as a digital supervisor, enhancing productivity.
DevOps
fromTechRepublic
3 days ago

AI Demand Is Forcing a Rethink of Data Center Power, Cooling

AI's rapid growth is challenging data center infrastructure, necessitating rethinking of power, cooling, and construction strategies.
Silicon Valley
fromTechCrunch
1 month ago

Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way | TechCrunch

Gimlet Labs raised $80 million to enhance AI inference efficiency across diverse hardware types.
Artificial intelligence
from24/7 Wall St.
3 days ago

Nvidia's CEO Sees Software Instances "Skyrocketing" - A Shift Investors Can't Ignore

Jensen Huang believes software will thrive despite the rise of AI agents, contradicting market fears of a software decline.
Gadgets
fromThe Verge
4 days ago

Framework's first eGPUs turn its laptop into a desktop PC

Framework introduces the OCuLink Dev Kit for external GPU support, targeting power users with advanced connectivity options.
Photography
fromAxios
3 days ago

Hands-on with ChatGPT's powerful new image engine

ChatGPT Images 2.0 offers personalized image creation with various aspect ratios and modes, enhancing user experience for both free and paid subscribers.
Business
from24/7 Wall St.
4 days ago

Forget Nvidia: Why HPE Could Be the Overlooked AI Infrastructure Play of 2026

Hewlett Packard Enterprise is an overlooked investment opportunity in AI infrastructure with strong financial growth and expanding margins.
Vue
fromGadgets 360
4 days ago

GeForce Now Review: Is Nvidia's High-End Cloud Gaming Service For You?

Cloud gaming in India is overcoming hardware and pricing barriers, allowing access to high-end gaming without expensive equipment.
#meta
Tech industry
fromInfoWorld
1 day ago

Meta's compute grab continues with agreement to deploy tens of millions of AWS Graviton cores

Meta is expanding its compute capabilities by partnering with AWS and utilizing multiple chip architectures for AI development.
Tech industry
fromTheregister
1 day ago

Meta to use millions of AWS Graviton cores

Meta will use tens of millions of AWS Graviton 5 CPU cores to support its AI deployments, marking a significant collaboration with Amazon.
Tech industry
fromComputerworld
1 day ago

Meta's compute grab continues with agreement to deploy tens of millions of AWS Graviton cores

Meta is expanding its compute capabilities by partnering with AWS and utilizing multiple chip architectures for AI development.
Tech industry
fromInfoWorld
1 day ago

Meta's compute grab continues with agreement to deploy tens of millions of AWS Graviton cores

Meta is expanding its compute capabilities by partnering with AWS and utilizing multiple chip architectures for AI development.
Tech industry
fromTheregister
1 day ago

Meta to use millions of AWS Graviton cores

Meta will use tens of millions of AWS Graviton 5 CPU cores to support its AI deployments, marking a significant collaboration with Amazon.
Tech industry
fromComputerworld
1 day ago

Meta's compute grab continues with agreement to deploy tens of millions of AWS Graviton cores

Meta is expanding its compute capabilities by partnering with AWS and utilizing multiple chip architectures for AI development.
Data science
fromTechzine Global
2 days ago

Pinecone On-Demand is thirsty for bursty workloads

Pinecone offers solutions for variable and sustained query workloads in AI, focusing on cost-effective and predictable performance.
#tpu-8t
Tech industry
fromTechzine Global
3 days ago

Google presents TPU 8t and TPU 8i chips; splits training and inference

Google Cloud introduces 8th-generation TPUs, TPU 8t for training and TPU 8i for inference, enhancing performance and efficiency in AI infrastructure.
Tech industry
fromTechzine Global
3 days ago

Google presents TPU 8t and TPU 8i chips; splits training and inference

Google Cloud introduces 8th-generation TPUs, TPU 8t for training and TPU 8i for inference, enhancing performance and efficiency in AI infrastructure.
#ai-infrastructure
DevOps
fromTechzine Global
4 days ago

95% of GPU capacity goes unused in Kubernetes clusters

GPU and CPU usage remains low despite rising cloud costs, highlighting inefficiencies in resource utilization as Kubernetes adoption increases.
Venture
fromTechCrunch
1 month ago

Thinking Machines Lab inks massive compute deal with Nvidia | TechCrunch

Mira Murati's Thinking Machines Lab signed a multi-year strategic partnership with Nvidia involving at least one gigawatt of Vera Rubin systems deployment starting in 2027, with Nvidia also making a strategic investment in the $12 billion-valued AI research company.
DevOps
fromTechzine Global
4 days ago

95% of GPU capacity goes unused in Kubernetes clusters

GPU and CPU usage remains low despite rising cloud costs, highlighting inefficiencies in resource utilization as Kubernetes adoption increases.
Venture
fromTechCrunch
1 month ago

Thinking Machines Lab inks massive compute deal with Nvidia | TechCrunch

Mira Murati's Thinking Machines Lab signed a multi-year strategic partnership with Nvidia involving at least one gigawatt of Vera Rubin systems deployment starting in 2027, with Nvidia also making a strategic investment in the $12 billion-valued AI research company.
Gadgets
fromTheregister
4 days ago

AMD's Ryzen 9 9950X3D2 Dual Edition tested

The Ryzen 9 9950X3D2 DE features 16 cores and 208 MB cache, but offers limited performance gains over cheaper models.
Tech industry
fromTechCrunch
1 day ago

In another wild turn for AI chips, Meta signs deal for millions of Amazon AI CPUs | TechCrunch

Meta has signed a deal to use millions of AWS Graviton chips for its AI needs, shifting from competitors like Google Cloud.
Tech industry
fromInfoQ
1 day ago

Cloudflare Optimizes Edge Stack for High-Core CPUs Instead of Large Cache

Cloudflare's Gen 13 servers enhance performance by leveraging many processor cores instead of large CPU caches, improving capacity and energy efficiency.
Data science
fromInfoQ
1 week ago

Google's TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware

TurboQuant compresses language models' Key-Value caches by up to 6x with near-zero accuracy loss, enabling efficient use of modest hardware.
fromArs Technica
4 days ago

AMD Ryzen 9 9950X3D2 Dual Edition review: Tons of cache for tons of dollars

What we didn't really find in our testing was evidence that the extra 64MB of L3 cache meaningfully improved performance beyond what the regular 9950X3D can already do.
Gadgets
Artificial intelligence
fromFast Company
2 days ago

OpenAI releases GPT-5.5, a more powerful engine for coding, science, and general work

OpenAI released GPT-5.5, enhancing Codex's capabilities for complex coding tasks and scientific work with improved autonomous functionality.
Python
fromThe JetBrains Blog
2 weeks ago

How to Train Your First TensorFlow Model in PyCharm | The PyCharm Blog

TensorFlow is an open-source framework for building and deploying machine learning models using tensors and high-level libraries like Keras.
Gadgets
fromWIRED
4 days ago

I've Tested Gaming Laptops for Over a Decade. This Is What I Think You Should Buy

Gaming laptops have evolved significantly, offering powerful performance and sleek designs, making them viable alternatives to desktop PCs.
Tech industry
fromTheregister
3 days ago

Google dual tracks TPU 8 to conquer training and inference

Google introduced TPU 8t and TPU 8i, enhancing AI training speed and reducing model serving costs significantly.
Data science
fromTheregister
1 week ago

Nvidia slaps forehead: AI, that's what quantum needs!

Nvidia's AI models aim to reduce quantum processor error rates significantly, enhancing the reliability of quantum computing applications.
Tech industry
fromTechCrunch
3 days ago

Google Cloud launches two new AI chips to compete with Nvidia | TechCrunch

Google Cloud's TPU 8t and TPU 8i chips enhance AI model training and inference, offering significant performance improvements over previous generations.
#ai-chips
Artificial intelligence
from24/7 Wall St.
2 days ago

Wall Street Pro Thinks Google's AI Chip Edge Is Getting Harder to Ignore

Alphabet's TPUs are emerging as competitive alternatives to Nvidia's GPUs, showcasing significant performance and cost advantages.
fromwww.businessinsider.com
3 days ago
Tech industry

Google's new chips are a shot at Nvidia and a big hint at where AI goes next

Google unveiled its latest AI chips, TPU 8t for training and TPU 8i for inference, responding to industry shifts towards inference computing.
Artificial intelligence
from24/7 Wall St.
2 days ago

Wall Street Pro Thinks Google's AI Chip Edge Is Getting Harder to Ignore

Alphabet's TPUs are emerging as competitive alternatives to Nvidia's GPUs, showcasing significant performance and cost advantages.
Tech industry
fromwww.businessinsider.com
3 days ago

Google's new chips are a shot at Nvidia and a big hint at where AI goes next

Google unveiled its latest AI chips, TPU 8t for training and TPU 8i for inference, responding to industry shifts towards inference computing.
Tech industry
fromTheregister
2 days ago

AI now gobbling up power and management chips for servers

The chip shortage is impacting power management chips, threatening server shipments as demand for AI products prioritizes manufacturing capacity.
#google
fromTNW | Deep-Tech
3 days ago
Tech industry

Google launches Ironwood TPU and previews eighth-gen split into training and inference chips at TSMC 2nm

Google's Ironwood TPU delivers 4.6 petaFLOPS per chip, marking a significant advancement in AI infrastructure with separate training and inference chips.
Tech industry
fromTNW | Deep-Tech
3 days ago

Google launches Ironwood TPU and previews eighth-gen split into training and inference chips at TSMC 2nm

Google's Ironwood TPU delivers 4.6 petaFLOPS per chip, marking a significant advancement in AI infrastructure with separate training and inference chips.
Artificial intelligence
fromAxios
5 days ago

Anthropic bites back in the compute wars with Amazon partnership

Anthropic is investing heavily in compute capacity to enhance its Claude models, competing directly with OpenAI's infrastructure advantage.
#ai-efficiency
Artificial intelligence
fromMedium
1 month ago

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Model quantization and architectural optimization can outperform larger models, challenging the belief that more GPUs equal greater intelligence.
Tech industry
fromTheregister
1 month ago

Nvidia slaps Groq into new LPX racks for faster AI response

Nvidia integrates Groq's language processing units into Vera Rubin systems to dramatically accelerate LLM inference, enabling hundreds to thousands of tokens per second per user.
Tech industry
fromComputerworld
1 month ago

System-level 'coopetition': Why Nvidia's DGX Rubin NVL8 runs on Intel Xeon 6

Nvidia's flagship DGX Rubin NVL8 AI systems use Intel Xeon 6 processors as host CPUs to maintain x86 compatibility and meet enterprise deployment requirements.
Artificial intelligence
fromTechCrunch
1 month ago

Niv-AI exits stealth to wring more power performance out of GPUs | TechCrunch

AI data centers waste significant power due to GPU demand surges, forcing operators to throttle performance by up to 30%, prompting startups like Niv-AI to develop precision power management solutions.
Tech industry
fromAxios
1 month ago

Nvidia's race to outpace physics

Nvidia CEO projects at least $1 trillion in revenue from newest chips through 2027, though market dominance has declined from 100% to 65% as energy efficiency becomes critical to AI scaling.
Artificial intelligence
fromTechzine Global
1 month ago

Nvidia's Groq 3 LPU targets agentic AI inference at GTC 2026

Nvidia's acquisition of Groq technology produces the Groq 3 LPU, a specialized inference chip delivering 40 petabytes per second bandwidth, significantly outpacing GPU inference speeds.
Tech industry
from24/7 Wall St.
1 month ago

Nvidia GPU availability near zero, AI compute demand off the charts

GPU availability is near zero, indicating demand from hyperscalers and enterprises far exceeds supply, validated by Nvidia's 73% revenue growth and 75% data center revenue increase.
Artificial intelligence
fromInfoWorld
1 month ago

Nvidia launches Nemotron 3 Super to power enterprise AI agents

Nemotron 3 Super's hybrid architecture combining Mamba and Transformer technologies enables enterprises to run complex AI agents more efficiently with lower costs and faster execution on existing infrastructure.
#ai-agents
fromEngadget
1 month ago
Artificial intelligence

NVIDIA is reportedly working on its own open-source AI agent platform

fromWIRED
1 month ago
Artificial intelligence

Nvidia Is Planning to Launch an Open-Source AI Agent Platform

Artificial intelligence
fromEngadget
1 month ago

NVIDIA is reportedly working on its own open-source AI agent platform

NVIDIA is developing NemoClaw, an enterprise-focused open-source AI agent platform designed to work across non-NVIDIA hardware with enhanced security features.
Artificial intelligence
fromWIRED
1 month ago

Nvidia Is Planning to Launch an Open-Source AI Agent Platform

Nvidia is launching NemoClaw, an open-source AI agent platform enabling enterprise software companies to deploy AI agents for workforce task automation, accessible regardless of chip dependency.
Artificial intelligence
fromTechzine Global
2 months ago

OpenAI seeks faster alternatives to Nvidia chips

OpenAI seeks alternative inference chips with larger on-chip SRAM to improve response speed for coding and AI-to-AI communication, aiming for about 10% of future inference capacity.
Artificial intelligence
fromTechCrunch
2 months ago

Running AI models is turning into a memory game | TechCrunch

Rising DRAM prices and sophisticated prompt-caching orchestration make memory management a critical cost and performance factor for large-scale AI deployments.
fromCointelegraph
2 months ago

What Role Is Left for Decentralized GPU Networks in AI?

What we are beginning to see is that many open-source and other models are becoming compact enough and sufficiently optimized to run very efficiently on consumer GPUs,
Artificial intelligence
Artificial intelligence
from24/7 Wall St.
1 month ago

NVIDIA Cements Its Role as the Backbone of AI Infrastructure

NVIDIA's networking revenue grew 162% year-over-year to $8.2 billion, nearly tripling GPU growth, signaling a shift from chip seller to integrated infrastructure provider selling complete AI data center systems.
fromInfoQ
2 months ago

NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference

The new capabilities center on two integrated components: the Dynamo Planner Profiler and the SLO-based Dynamo Planner. These tools work together to solve the "rate matching" challenge in disaggregated serving. The teams use this term when they split inference workloads. They separate prefill operations, which process the input context, from decode operations that generate output tokens. These tasks run on different GPU pools. Without the right tools, teams spend a lot of time determining the optimal GPU allocation for these phases.
Artificial intelligence
Artificial intelligence
fromHackernoon
2 months ago

This "Flash" AI Model Is Fast and Dangerous at Math-Here's What It Can Do | HackerNoon

GLM-4.7-Flash is a 30-billion-parameter mixture-of-experts model offering strong performance for lightweight deployment.
Artificial intelligence
fromComputerWeekly.com
1 month ago

Edge AI: What's working and what isn't | Computer Weekly

Edge AI deployment success depends on identifying efficient, narrow use cases with manageable risks rather than pursuing sophisticated, large-scale models across all applications.
[ Load more ]