Personalized Text-to-Image Generation with Auto-Regressive Models

Kaiyue Sun, Xian Liu, Yao Teng, Xihui Liu

2025-04-23

Personalized Text-to-Image Generation with Auto-Regressive Models

Summary

This paper talks about how auto-regressive models can be used to create personalized images from text descriptions, reaching similar quality to the popular diffusion models by using a special two-step training process.

What's the problem?

The problem is that while generating images from text has become really impressive, making these images truly personalized for each user is still hard, and most of the best results have come from diffusion models, which can be slow and complicated.

What's the solution?

To solve this, the researchers used auto-regressive models and trained them in two stages: first, they improved how the model understands the text by optimizing the text embeddings, and then they fine-tuned the model's transformer layers to make the image generation more accurate and personal.

Why it matters?

This is important because it shows that faster and simpler auto-regressive models can compete with diffusion models for personalized image creation, which could make these tools more accessible and efficient for everyone.

Abstract

Auto-regressive models achieve comparable performance to diffusion models in personalized image synthesis through a two-stage training strategy that optimizes text embeddings and fine-tunes transformer layers.

View Paper