SambaNova Cloud serves up Llama 3.1 405B at 100+ token/s
Briefly

"According to CEO Rodrigo Liang, SambaNova has managed to get Meta's 405 billion parameter Llama 3.1 model to churn out tokens at a rate of 132 per second and at the full 16-bit precision it was trained at no less."
"To put that in perspective, its estimated the average person can read at about 5 words per second. At 132 tokens a second, SambaNova's system is nearly twice as fasts as the next fastest GPU systems."
Read at Theregister
[
]
[
|
]