vAttention System Design: Dynamic KV-Cache with Contiguous Virtual Memory | HackerNoon
vAttention aims to enhance efficiency in large language models by utilizing dynamic memory allocation to improve handling of KV-cache while minimizing physical memory waste.