D-AR: Diffusion via Autoregressive Models

Ziteng Gao, Mike Zheng Shou

2025-05-30

D-AR: Diffusion via Autoregressive Models

Summary

This paper talks about D-AR, a new method that uses language model techniques to create images step by step, making it possible to generate high-quality pictures that can be easily controlled and previewed as they are being made.

What's the problem?

The problem is that most image generation models either take a long time to show results or don't give users much control over how the image is built up, which can be frustrating if you want to see and adjust the picture as it's being created.

What's the solution?

The researchers turned the process of making images into something similar to how language models generate text, building the image one part at a time in a predictable order. This lets users see previews as the image develops and even control the layout, all while keeping the final image quality very high.

Why it matters?

This is important because it makes AI image generation more user-friendly and interactive, allowing artists, designers, and everyday users to get better results and more control over the creative process.

Abstract

Diffusion via Autoregressive models (D-AR) recasts the image diffusion process as a standard autoregressive task, achieving high-quality image generation with consistent previews and layout control using a large language model backbone.

View Paper