D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

Yanran Zhang, Bingyao Yu, Yu Zheng, Wenzhao Zheng, Yueqi Duan, Lei Chen, Jie Zhou, Jiwen Lu

2025-10-09

D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

Summary

This paper focuses on detecting images created by a new type of artificial intelligence called visual autoregressive models, which are really good at generating realistic pictures. It's becoming harder to tell what's real and what's AI-generated, so this research aims to build a better detector.

What's the problem?

Traditionally, AI image generators like GANs and diffusion models had telltale signs that revealed they weren't real. However, these new autoregressive models work differently, predicting images piece by piece like a puzzle, and they create images that are much more convincing. This makes it difficult to identify images created by these models because the usual detection methods don't work as well. Specifically, the way these models represent images using a 'codebook' has subtle differences between real and fake images, but these differences are hard to spot.

What's the solution?

The researchers developed a new method called D^3QE that looks at the 'quantization error' – essentially, the tiny imperfections in how the AI model represents the image. They noticed that real and fake images have different patterns in these errors and how often certain 'codes' are used. They built a special transformer network that pays attention to these codebook frequencies and combines that information with the image's content to better distinguish between real and AI-generated images. They also created a large dataset of images from seven different autoregressive models to test their method.

Why it matters?

This research is important because as AI image generation gets more advanced, it becomes easier to create fake images that could be used to spread misinformation or deceive people. Having reliable methods to detect these images is crucial for maintaining trust in visual information and preventing malicious use of this technology. This new method shows promising results in accurately identifying images created by these new autoregressive models, even when the images are altered to look more realistic.

Abstract

The emergence of visual autoregressive (AR) models has revolutionized image generation while presenting new challenges for synthetic image detection. Unlike previous GAN or diffusion-based methods, AR models generate images through discrete token prediction, exhibiting both marked improvements in image synthesis quality and unique characteristics in their vector-quantized representations. In this paper, we propose to leverage Discrete Distribution Discrepancy-aware Quantization Error (D^3QE) for autoregressive-generated image detection that exploits the distinctive patterns and the frequency distribution bias of the codebook existing in real and fake images. We introduce a discrete distribution discrepancy-aware transformer that integrates dynamic codebook frequency statistics into its attention mechanism, fusing semantic features and quantization error latent. To evaluate our method, we construct a comprehensive dataset termed ARForensics covering 7 mainstream visual AR models. Experiments demonstrate superior detection accuracy and strong generalization of D^3QE across different AR models, with robustness to real-world perturbations. Code is available at https://github.com/Zhangyr2022/D3QE{https://github.com/Zhangyr2022/D3QE}.

View Paper