Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression
Dingcheng Zhen, Qian Qiao, Tan Yu, Kangxi Wu, Ziwei Zhang, Siyuan Liu, Shunshun Yin, Ming Tao
2025-06-17
Summary
This paper talks about TransDiff, a new AI model that combines two powerful image generation methods: Autoregressive Transformers and diffusion models. TransDiff creates images by first turning inputs into high-level features using the Transformer, and then quickly generating detailed images using a smaller diffusion model. It also introduces Multi-Reference Autoregression, a new technique that allows the model to look at multiple previous images it made to create more varied and higher-quality images.
What's the problem?
The problem is that existing image generation models either take a long time to create high-quality images or can't produce very diverse and detailed images quickly. Autoregressive Transformers build images step-by-step but can be slow, while diffusion models can be slow but produce high quality. There wasn't a good way to get the best of both speed and quality before.
What's the solution?
The solution was TransDiff's design that cleverly combines the strengths of both methods: it uses an Autoregressive Transformer to encode image features and a smaller diffusion model to produce the final images very quickly. On top of that, Multi-Reference Autoregression lets the model refer to several past generated images to learn from them, which helps it create more diverse and better-quality images in future steps.
Why it matters?
This matters because TransDiff greatly improves how fast and how well AI can generate images, making it possible to create detailed and diverse pictures much quicker than before. This can help in many fields like art, design, gaming, and any area where realistic or creative images are needed, pushing forward what AI can do in image creation.
Abstract
TransDiff, combining an Autoregressive Transformer and diffusion models, achieves superior image generation performance and speed, while Multi-Reference Autoregression further enhances its quality and diversity.