SambaNova Cloud serves up Llama 3.1 405B at 100+ token/s

from Theregister 7 months ago

"According to CEO Rodrigo Liang, SambaNova has managed to get Meta's 405 billion parameter Llama 3.1 model to churn out tokens at a rate of 132 per second and at the full 16-bit precision it was trained at no less."
Theregisterhttps://www.theregister.com/2024/09/10/sambanovas_inference_cloud/

"To put that in perspective, its estimated the average person can read at about 5 words per second. At 132 tokens a second, SambaNova's system is nearly twice as fasts as the next fastest GPU systems."
Theregisterhttps://www.theregister.com/2024/09/10/sambanovas_inference_cloud/

Read at Theregister

#ai #sambanova #llama-31 #cloud-computing #inference

Collection

[

...

]

SambaNova Cloud serves up Llama 3.1 405B at 100+ token/sSambaNova Cloud serves up Llama 3.1 405B at 100+ token/s Briefly

SambaNova Cloud serves up Llama 3.1 405B at 100+ token/s
SambaNova Cloud serves up Llama 3.1 405B at 100+ token/s
Briefly