The Extreme LLM Compression Evolution: From QuIP to AQLM With PV-Tuning | HackerNoon
Briefly

Technological advancements now allow compressing neural networks from 16 to 4 bits, with ongoing research aiming at reducing model size eightfold to 2 bits.
Yandex Research, along with colleagues, introduced a method combining AQLM and PV-Tuning to achieve 8x compression, with their code freely available on GitHub.
Competition between research teams developing compression algorithms like QuIP and AQLM leads to ongoing innovation, breakthroughs, and optimizations in model compression.
Read at Hackernoon
[
|
]