#inference-optimization

[ follow ]
fromTheregister
5 days ago

Alibaba reveals 82 percent GPU resource savings

Titled "Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market", the paper [PDF] opens by pointing out that model-mart Hugging Face lists over a million AI models, although customers mostly run just a few of them. Alibaba Cloud nonetheless offers many models but found it had to dedicate 17.7 percent of its GPU fleet to serving just 1.35 percent of customer requests.
Artificial intelligence
Artificial intelligence
fromTechCrunch
1 month ago

Clarifai's new reasoning engine makes AI models faster and less expensive | TechCrunch

Clarifai’s new reasoning engine doubles inference speed and reduces inference costs by about 40% through software optimizations adaptable across models and cloud hosts.
[ Load more ]