Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Vage Egiazarian, Denis Kuznedelev, Anton Voronov, Ruslan Svirschevski, Michael Goin, Daniil Pavlov, Dan Alistarh, Dmitry Baranchuk

2024-09-04

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Summary

This paper talks about a new method for compressing text-to-image diffusion models, making them more efficient while maintaining high-quality image generation.

What's the problem?

Text-to-image diffusion models are powerful tools that create images based on text descriptions, but they often require a lot of computing power and storage because they have billions of parameters. This makes them hard to use, especially in situations where resources are limited.

What's the solution?

The authors propose using a technique called vector quantization (VQ) to compress these models more effectively than previous methods, which mainly used uniform scalar quantization. By applying VQ, they can reduce the size of the model from around 4 bits to about 3 bits while still keeping the image quality high. They specifically tailored this method for large-scale models like SDXL and SDXL-Turbo, allowing for better performance with fewer resources.

Why it matters?

This research is important because it makes advanced image generation technology more accessible to a wider range of users, including those with limited computing resources. By improving the efficiency of these models, it opens up new possibilities for creative applications in fields like art, advertising, and virtual reality.

Abstract

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in size and already contain billions of parameters. As a result, state-of-the-art text-to-image models are becoming less accessible in practice, especially in resource-limited environments. Post-training quantization (PTQ) tackles this issue by compressing the pretrained model weights into lower-bit representations. Recent diffusion quantization techniques primarily rely on uniform scalar quantization, providing decent performance for the models compressed to 4 bits. This work demonstrates that more versatile vector quantization (VQ) may achieve higher compression rates for large-scale text-to-image diffusion models. Specifically, we tailor vector-based PTQ methods to recent billion-scale text-to-image models (SDXL and SDXL-Turbo), and show that the diffusion models of 2B+ parameters compressed to around 3 bits using VQ exhibit the similar image quality and textual alignment as previous 4-bit compression techniques.

View Paper