Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
Runqian Wang, Yilun Du
2025-10-08
Summary
This paper introduces a new way to create realistic images, called Equilibrium Matching, or EqM, that improves upon existing methods like diffusion and flow-based models.
What's the problem?
Current image generation techniques, like diffusion and flow models, rely on complex processes that simulate how things change over time to create images. These methods can be slow and aren't always the most efficient way to get high-quality results. They essentially try to reverse a process of adding noise, which is computationally expensive.
What's the solution?
EqM takes a different approach. Instead of focusing on the step-by-step process of creating an image, it learns what the 'shape' of the ideal image looks like – think of it like finding the bottom of a valley. Then, to generate an image, it simply starts at a random point and 'rolls downhill' until it reaches the bottom, which represents a realistic image. This 'rolling downhill' is done using optimization techniques, allowing for flexible control over how quickly and accurately the image is created. It learns the equilibrium gradient of an implicit energy landscape.
Why it matters?
EqM generates images that are better quality than those produced by current methods, as shown by a score called FID. It’s also more versatile, meaning it can be used for other tasks like cleaning up noisy images, identifying images that don't quite fit the pattern, and combining different images. Ultimately, it provides a simpler and more efficient way to generate images and bridges the gap between different types of generative models.
Abstract
We introduce Equilibrium Matching (EqM), a generative modeling framework built from an equilibrium dynamics perspective. EqM discards the non-equilibrium, time-conditional dynamics in traditional diffusion and flow-based generative models and instead learns the equilibrium gradient of an implicit energy landscape. Through this approach, we can adopt an optimization-based sampling process at inference time, where samples are obtained by gradient descent on the learned landscape with adjustable step sizes, adaptive optimizers, and adaptive compute. EqM surpasses the generation performance of diffusion/flow models empirically, achieving an FID of 1.90 on ImageNet 256times256. EqM is also theoretically justified to learn and sample from the data manifold. Beyond generation, EqM is a flexible framework that naturally handles tasks including partially noised image denoising, OOD detection, and image composition. By replacing time-conditional velocities with a unified equilibrium landscape, EqM offers a tighter bridge between flow and energy-based models and a simple route to optimization-driven inference.