#llm-serving

[ follow ]
fromTheregister
17 hours ago

Alibaba reveals 82 percent GPU resource savings

Titled "Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market", the paper [PDF] opens by pointing out that model-mart Hugging Face lists over a million AI models, although customers mostly run just a few of them. Alibaba Cloud nonetheless offers many models but found it had to dedicate 17.7 percent of its GPU fleet to serving just 1.35 percent of customer requests.
Artificial intelligence
Growth hacking
fromInfoQ
4 months ago

Scaling Large Language Model Serving Infrastructure at Meta

LLM serving is evolving into a foundational technology similar to an operating system.
[ Load more ]