#model-inference

[ follow ]
Artificial intelligence
fromInfoQ
1 week ago

KubeCon NA 2025 - Erica Hughberg and Alexa Griffith on Tools for the Age of GenAI

GenAI platforms require AI-native routing, token-level rate limiting, centralized credential management, and observability, resilience, and failover enabled by Kubernetes-based tools.
fromTheregister
3 months ago

Alibaba looks to end reliance on Nvidia for AI inference

First reported by the Wall Street Journal Friday, the ecommerce giant's latest chip is aimed specifically at AI inference, which refers to serving models as opposed to training them. Alibaba's T-Heat division has been working on AI silicon for some time now. In 2019, it introduced the Hanguang 800. However, unlike modern chips from Nvidia and AMD, the part was primarily aimed at conventional machine learning models like ResNet - not the large language and diffusion models that power AI chatbots and image generators today.
Artificial intelligence
[ Load more ]