EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis

2025-02-18

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative
Image Modeling

Summary

This paper talks about EQ-VAE, a new way to improve AI systems that create images. It focuses on making the AI's understanding of images simpler and more consistent, especially when the images are changed in ways that don't affect their meaning, like making them bigger or rotating them.

What's the problem?

Current AI systems that create images use something called autoencoders to compress images into a simpler form. However, these autoencoders get confused when images are rotated or resized, making it harder for the AI to work with them. This confusion leads to slower and less effective image creation.

What's the solution?

The researchers created EQ-VAE, which teaches autoencoders to handle rotated or resized images better. They do this by adding a new rule when training the AI that makes it treat similar images the same way, even if they're rotated or resized. This makes the AI's understanding of images simpler without losing any important details. They tested EQ-VAE with several top AI image creation systems and found it made them work much faster and better.

Why it matters?

This matters because it could make AI image creation faster and more efficient. It could lead to better AI tools for artists, designers, and anyone who works with digital images. The improvement is significant, making some systems work seven times faster, which could save a lot of time and computing power. Also, because EQ-VAE works with different types of AI systems, it could be used to improve many different image-creation tools.

Abstract

Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models. Project page and code: https://eq-vae.github.io/.

View Paper