How Good Is PagedAttention at Memory Sharing? | HackerNoon
Briefly

In evaluating memory sharing within PagedAttention, we found that vLLM outperforms Orca significantly in both parallel sampling and beam search, demonstrating substantial benefits in memory efficiency.
Our results indicate that with increased parallel sequences, the vLLM architecture improves memory usage, achieving up to 66.3% savings in some cases, particularly in beam search scenarios.
Read at Hackernoon
[
|
]