Transition Models: Rethinking the Generative Learning Objective

Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, Lei Bai

2025-09-05

Transition Models: Rethinking the Generative Learning Objective

Summary

This paper introduces a new way to generate images, called Transition Models (TiM), that aims to be both high-quality and efficient.

What's the problem?

Currently, creating realistic images with AI involves a trade-off. Methods that produce incredibly detailed images take a long time and a lot of computing power. Faster methods sacrifice quality, meaning they can't create images that look as good. This happens because existing techniques either focus on making tiny, incremental changes to an image or try to predict the final image directly, both of which have limitations.

What's the solution?

The researchers developed a new mathematical equation that describes how images change over time during the generation process. This allows their model, TiM, to take steps of any size – it can make big leaps to quickly get a rough image, then refine it with smaller steps for detail. This approach allows TiM to achieve high quality without needing as much computing power as other methods.

Why it matters?

TiM is important because it breaks the quality barrier for fast image generation. It outperforms larger, more complex models while using fewer resources, and its image quality actually *improves* as you allow it to take more steps, which isn't typical of faster generators. This means we can potentially get high-quality images much more quickly and efficiently, opening up possibilities for wider use of AI image generation.

Abstract

A fundamental dilemma in generative modeling persists: iterative diffusion models achieve outstanding fidelity, but at a significant computational cost, while efficient few-step alternatives are constrained by a hard quality ceiling. This conflict between generation steps and output quality arises from restrictive training objectives that focus exclusively on either infinitesimal dynamics (PF-ODEs) or direct endpoint prediction. We address this challenge by introducing an exact, continuous-time dynamics equation that analytically defines state transitions across any finite time interval. This leads to a novel generative paradigm, Transition Models (TiM), which adapt to arbitrary-step transitions, seamlessly traversing the generative trajectory from single leaps to fine-grained refinement with more steps. Despite having only 865M parameters, TiM achieves state-of-the-art performance, surpassing leading models such as SD3.5 (8B parameters) and FLUX.1 (12B parameters) across all evaluated step counts. Importantly, unlike previous few-step generators, TiM demonstrates monotonic quality improvement as the sampling budget increases. Additionally, when employing our native-resolution strategy, TiM delivers exceptional fidelity at resolutions up to 4096x4096.

View Paper