Personalized Text-to-Image Generation with Auto-Regressive Models
Kaiyue Sun, Xian Liu, Yao Teng, Xihui Liu
2025-04-23
Summary
This paper talks about how auto-regressive models can be used to create personalized images from text descriptions, reaching similar quality to the popular diffusion models by using a special two-step training process.
What's the problem?
The problem is that while generating images from text has become really impressive, making these images truly personalized for each user is still hard, and most of the best results have come from diffusion models, which can be slow and complicated.
What's the solution?
To solve this, the researchers used auto-regressive models and trained them in two stages: first, they improved how the model understands the text by optimizing the text embeddings, and then they fine-tuned the model's transformer layers to make the image generation more accurate and personal.
Why it matters?
This is important because it shows that faster and simpler auto-regressive models can compete with diffusion models for personalized image creation, which could make these tools more accessible and efficient for everyone.
Abstract
Auto-regressive models achieve comparable performance to diffusion models in personalized image synthesis through a two-stage training strategy that optimizes text embeddings and fine-tunes transformer layers.