Price of universality in vector quantization is at most 0.11 bit
arXiv:2602.05790v1 Announce Type: cross Abstract: Fast computation of a matrix product $W^top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $widehat W$ in place of true $W$ (“weight-only quantization”). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ […]