Vector Quantization using Gaussian Variational Autoencoder
Tongda Xu, Wendi Zheng, Jiajun He, Jose Miguel Hernandez-Lobato, Yan Wang, Ya-Qin Zhang, Jie Tang
2025-12-09
Summary
This paper introduces a new way to create a type of image compression model called a Vector Quantized Variational Autoencoder, or VQ-VAE, directly from a simpler type of model called a Gaussian VAE. It aims to make building these compression models easier and more effective.
What's the problem?
VQ-VAEs are good at compressing images, but they're notoriously difficult to train because they involve turning continuous image data into discrete 'tokens'. This 'discretization' process can cause problems during training, making it unstable and hard to get good results. Existing methods to convert a Gaussian VAE to a VQ-VAE aren't the best.
What's the solution?
The researchers developed a technique called Gaussian Quant (GQ) that cleverly transforms a Gaussian VAE into a VQ-VAE *without* needing to retrain the whole thing. GQ essentially uses random 'noise' as a starting point for the discrete tokens and finds the noise that's closest to the information the Gaussian VAE has already learned. They also figured out a mathematical rule showing that this works well when the number of tokens is large enough, and a practical training trick called 'target divergence constraint' (TDC) to help the Gaussian VAE work best with GQ.
Why it matters?
This work is important because it provides a simpler and more effective way to build VQ-VAEs. GQ outperforms other existing VQ-VAE methods and also improves upon techniques for converting Gaussian VAEs to VQ-VAEs, meaning better image compression and potentially faster development of these types of models. The code being publicly available also allows others to build upon this research.
Abstract
Vector quantized variational autoencoder (VQ-VAE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to discretization. In this paper, we propose a simple yet effective technique, dubbed Gaussian Quant (GQ), that converts a Gaussian VAE with certain constraint into a VQ-VAE without training. GQ generates random Gaussian noise as a codebook and finds the closest noise to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAE for effective GQ, named target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves upon previous Gaussian VAE discretization methods, such as TokenBridge. The source code is provided in https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE.