TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Zhenglin Cheng, Peng Sun, Jianguo Li, Tao Lin

2025-12-08

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Summary

This paper introduces a new method called TwinFlow for creating faster image generation models, specifically those that turn text into images.

What's the problem?

Current state-of-the-art image generation models are really good, but they take a long time to create an image. They work in many steps, needing 40 to 100 calculations to finish one image. Attempts to speed things up either require a lot of training time, lose image quality when made very fast, or become unstable and require a lot of computer memory.

What's the solution?

TwinFlow offers a way to train a model to generate images in just *one* step, without needing a pre-trained 'teacher' model or using complicated, unstable techniques like adversarial networks. It achieves this by a clever training process that allows it to learn quickly and efficiently.

Why it matters?

This is important because it allows for much faster image generation, reducing the computational cost by a factor of 100 while maintaining similar image quality. This makes it practical to build and use very large, powerful image generation models without needing massive amounts of computing power.

Abstract

Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need of fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by 100times with minor quality degradation. Project page is available at https://zhenglin-cheng.com/twinflow.

View Paper