#ai-inference

[ follow ]
Artificial intelligence
fromIT Pro
2 weeks ago

'TPUs just work': Why Google Cloud is betting big on its custom chips

Google's seventh generation TPU, 'Ironwood', aims to lead in AI workload efficiency and cost-effectiveness.
TPUs were developed with a cohesive hardware-software synergy, enhancing their utility for AI applications.
#nvidia
fromBusiness Insider
2 weeks ago
Artificial intelligence

AMD's CTO says AI inference will move out of data centers and increasingly to phones and laptops

AMD is positioning itself to capitalize on the shift to AI inference, targeting market segments traditionally dominated by Nvidia.
fromTheregister
1 month ago
Tech industry

Nvidia unveils 288 GB Blackwell Ultra GPUs

Nvidia's Blackwell Ultra architecture enhances AI inference with massive performance and increased memory capacity.
Improved throughput enables AI models like DeepSeek-R1 to respond rapidly, boosting efficiency.
fromBusiness Insider
2 weeks ago
Artificial intelligence

AMD's CTO says AI inference will move out of data centers and increasingly to phones and laptops

AMD is positioning itself to capitalize on the shift to AI inference, targeting market segments traditionally dominated by Nvidia.
fromTheregister
1 month ago
Tech industry

Nvidia unveils 288 GB Blackwell Ultra GPUs

Nvidia's Blackwell Ultra architecture enhances AI inference with massive performance and increased memory capacity.
Improved throughput enables AI models like DeepSeek-R1 to respond rapidly, boosting efficiency.
more#nvidia
fromInfoQ
3 months ago
Agile

Hugging Face Expands Serverless Inference Options with New Provider Integrations

Hugging Face now integrates four serverless inference providers into its platform, enhancing ease of access and speed for AI model inference.
#edge-computing
fromInfoQ
5 months ago
Artificial intelligence

Efficient Resource Management with Small Language Models (SLMs) in Edge Computing

Small Language Models (SLMs) enable AI inference on edge devices without overwhelming resource limitations.
fromTheregister
6 months ago
Miscellaneous

Supermicro crams 18 GPUs into a 3U box

Supermicro's SYS-322GB-NR efficiently accommodates 18 GPUs in a compact design for edge AI and visualization tasks.
fromInfoQ
5 months ago
Artificial intelligence

Efficient Resource Management with Small Language Models (SLMs) in Edge Computing

Small Language Models (SLMs) enable AI inference on edge devices without overwhelming resource limitations.
fromTheregister
6 months ago
Miscellaneous

Supermicro crams 18 GPUs into a 3U box

Supermicro's SYS-322GB-NR efficiently accommodates 18 GPUs in a compact design for edge AI and visualization tasks.
more#edge-computing
fromTechCrunch
7 months ago
Artificial intelligence

Runware uses custom hardware and advanced orchestration for fast AI inference | TechCrunch

Runware offers rapid image generation through optimized servers, seeking to disrupt traditional GPU rental models with an API-based pricing structure.
[ Load more ]