StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
Senmao Li, Kai Wang, Salman Khan, Fahad Shahbaz Khan, Jian Yang, Yaxing Wang
2025-12-22
Summary
This paper focuses on improving the speed of creating images using a type of AI model called Visual Autoregressive (VAR) models, which build images step-by-step, similar to how you might draw an image gradually.
What's the problem?
VAR models are really good at making high-quality images, but they take a long time to finish, especially when creating detailed images. Previous attempts to speed them up involved manually choosing which steps to skip, but this wasn't very smart because some steps are more important than others for the overall image quality.
What's the solution?
The researchers developed a system called StageVAR that analyzes the image creation process and figures out which steps are most crucial. They found that the early steps define the basic shape and content of the image, so those should be done carefully. Later steps just add finer details and can be sped up or simplified without hurting the overall image too much. StageVAR automatically adjusts the process based on this idea, making it faster without significantly reducing quality.
Why it matters?
This work is important because it makes VAR models much more practical for real-world use. By significantly speeding up image generation while maintaining good quality, it opens the door for faster and more efficient AI-powered image creation tools and applications.
Abstract
Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image generation. However, the VAR paradigm suffers from sharply increased computational complexity and running time at large-scale steps. Although existing acceleration methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process. To address this challenge, we present StageVAR, a systematic study and stage-aware acceleration framework for VAR models. Our analysis shows that early steps are critical for preserving semantic and structural consistency and should remain intact, while later steps mainly refine details and can be pruned or approximated for acceleration. Building on these insights, StageVAR introduces a plug-and-play acceleration strategy that exploits semantic irrelevance and low-rank properties in late-stage computations, without requiring additional training. Our proposed StageVAR achieves up to 3.4x speedup with only a 0.01 drop on GenEval and a 0.26 decrease on DPG, consistently outperforming existing acceleration baselines. These results highlight stage-aware design as a powerful principle for efficient visual autoregressive image generation.