Adversarial Flow Models
Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan
2025-12-01
Summary
This paper introduces a new type of generative model called adversarial flow models, which combines the strengths of two existing approaches: adversarial networks (like GANs) and flow models.
What's the problem?
Traditional generative models like GANs can be difficult to train because the generator needs to learn a complex and often unstable way to transform random noise into realistic data. Flow models are more stable but often require many steps to generate an image, which takes a lot of computing power and can lead to errors building up over time. Existing methods also often need to learn what the image should look like at *every* step of the generation process, which is inefficient.
What's the solution?
Adversarial flow models solve this by creating a generator that learns a direct, step-by-step transformation from noise to data. This makes training more stable like flow models, but it doesn't require learning intermediate steps like many other methods. The model is trained using a technique called adversarial training, where two networks compete against each other to improve performance. They demonstrate that their model can achieve high-quality image generation with fewer steps and less computational effort than previous approaches.
Why it matters?
This research is important because it offers a more efficient and stable way to generate realistic images. By reducing the number of steps needed for generation and simplifying the training process, it opens the door to creating more powerful and accessible generative models, achieving state-of-the-art results on image generation benchmarks like ImageNet.
Abstract
We present adversarial flow models, a class of generative models that unifies adversarial models and flow models. Our method supports native one-step or multi-step generation and is trained using the adversarial objective. Unlike traditional GANs, where the generator learns an arbitrary transport plan between the noise and the data distributions, our generator learns a deterministic noise-to-data mapping, which is the same optimal transport as in flow-matching models. This significantly stabilizes adversarial training. Also, unlike consistency-based methods, our model directly learns one-step or few-step generation without needing to learn the intermediate timesteps of the probability flow for propagation. This saves model capacity, reduces training iterations, and avoids error accumulation. Under the same 1NFE setting on ImageNet-256px, our B/2 model approaches the performance of consistency-based XL/2 models, while our XL/2 model creates a new best FID of 2.38. We additionally show the possibility of end-to-end training of 56-layer and 112-layer models through depth repetition without any intermediate supervision, and achieve FIDs of 2.08 and 1.94 using a single forward pass, surpassing their 2NFE and 4NFE counterparts.