Old RTX 3090 enough to serve thousands of LLM users

from Theregister 7 months ago

A single Nvidia RTX 3090 can successfully serve a modest LLM like Llama 3.1 8B at FP16, handling over 100 concurrent requests and supporting thousands of users.
Theregisterhttps://www.theregister.com/2024/08/23/3090_ai_benchmark/

Backprop contends that because only a small fraction of users make requests simultaneously, a single RTX 3090 could sufficiently power thousands of end users.
Theregisterhttps://www.theregister.com/2024/08/23/3090_ai_benchmark/

While typically considered consumer hardware, Backprop’s use of the RTX 3090 demonstrates its capability; it provides 142 teraFLOPS of FP16 performance, suitable for LLM tasks.
Theregisterhttps://www.theregister.com/2024/08/23/3090_ai_benchmark/

The RTX 3090's limitations include memory capacity; with only 24GB, it can't handle larger models like Llama 3 70B, prompting Backprop to choose a smaller model.
Theregisterhttps://www.theregister.com/2024/08/23/3090_ai_benchmark/

Read at Theregister

#gpu-cloud #language-models #backprop #nvidia-rtx-3090 #consumer-hardware

Collection

[

...

]

Old RTX 3090 enough to serve thousands of LLM usersOld RTX 3090 enough to serve thousands of LLM users Briefly

Old RTX 3090 enough to serve thousands of LLM users
Old RTX 3090 enough to serve thousands of LLM users
Briefly