#multi-head-latent-attention

[ follow ]
Python
fromPyImageSearch
1 month ago

KV Cache Optimization via Multi-Head Latent Attention - PyImageSearch

Multi-head Latent Attention compresses per-head KV tensors into shared low-rank latents, cutting KV cache memory and compute while preserving attention quality.
[ Load more ]