Abstract: For uniform scalar quantization, the error distribution is approximately a uniform distribution over an interval (which is also a 1-dimensional ball ...
In an article recently posted to the Meta Research website, researchers focused on improving vector quantization for data compression and vector search. They introduced quantization with implicit ...
TurboQuant vector quantization is Google Research’s latest bid to shrink the KV cache burden in LLM inference. Instead of focusing on model weights, the method targets runtime memory, with claims of ...
Abstract: In neural audio coding, latent space quantization is often trained together with the rest of the model. In this work, we investigate the use of algebraic vector quantization (VQ) in a ...
Large language models (LLMs) aren’t actually giant computer brains. Instead, they are massive vector spaces in which the probabilities of tokens occurring in a specific order is encoded. Billions of ...
A vector quantization library originally transcribed from Deepmind's tensorflow implementation, made conveniently into a package. It uses exponential moving averages to update the dictionary. VQ has ...