Diversity Has Always Been There in Your Visual Autoregressive Models

Tong Wang, Guanyu Yang, Nian Liu, Kai Wang, Yaxing Wang, Abdelrahman M Shaker, Salman Khan, Fahad Shahbaz Khan, Senmao Li

2025-11-24

Diversity Has Always Been There in Your Visual Autoregressive Models

Summary

This paper focuses on improving the variety of images created by a new type of image generation model called Visual Autoregressive (VAR) models, which are known for being fast and producing high-quality images.

What's the problem?

VAR models, while efficient and good at image quality, tend to produce very similar images – a problem called 'diversity collapse'. This means they don't explore the full range of possible images they *could* create, limiting their usefulness and creativity, similar to issues seen in other types of image generators.

What's the solution?

The researchers developed a technique called DiverseVAR that doesn't require any extra training of the model. They realized that a specific part of the image information, called the 'pivotal component', is crucial for creating diverse images early in the generation process. DiverseVAR works by reducing the influence of this component when the model receives the initial input and then boosting its influence when the model creates the final image, effectively unlocking the model's potential for variety.

Why it matters?

This work is important because it addresses a key weakness of VAR models – their lack of diversity – without sacrificing their speed or image quality. By improving the variety of images generated, DiverseVAR makes these models more useful for applications where creativity and a wide range of outputs are needed.

Abstract

Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency and image quality compared to traditional multi-step autoregressive (AR) and diffusion models. However, despite their efficiency, VAR models often suffer from the diversity collapse i.e., a reduction in output variability, analogous to that observed in few-step distilled diffusion models. In this paper, we introduce DiverseVAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training. Our analysis reveals the pivotal component of the feature map as a key factor governing diversity formation at early scales. By suppressing the pivotal component in the model input and amplifying it in the model output, DiverseVAR effectively unlocks the inherent generative potential of VAR models while preserving high-fidelity synthesis. Empirical results demonstrate that our approach substantially enhances generative diversity with only neglectable performance influences. Our code will be publicly released at https://github.com/wangtong627/DiverseVAR.

View Paper