Import AI
Briefly

Researchers have developed PowerInfer, which offloads some neurons of a language model to a local GPU and the rest to CPU, showing significant efficiency improvements over llama.cpp.
PowerInfer leverages the power law distribution of neuron activation in models, with hot-activated neurons running on the GPU for fast access, reducing memory demands and data transfers.
Read at Import AI
[
add
]
[
|
|
]