MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration

Junhyuk So, Hyunho Kook, Chaeyeon Jang, Eunhyeok Park

2025-10-30

MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration

Summary

This paper introduces a new method, MC-SJD, to speed up the process of creating images and videos using a type of artificial intelligence called autoregressive modeling.

What's the problem?

Autoregressive models are really good at generating detailed images and videos, but they're incredibly slow because they create the final product bit by bit, like writing a sentence one word at a time. This 'per-token generation' takes a long time, needing thousands of steps for just one image. A recent improvement called Speculative Jacobi Decoding (SJD) helped, but it often made mistakes when predicting what the next piece should be, leading to a lot of rejected attempts and slowing things down.

What's the solution?

The researchers developed MC-SJD, which builds on SJD. The key idea is to make the AI more consistent in its predictions across multiple attempts. Instead of randomly guessing what the next piece should be each time, MC-SJD uses a technique based on information theory to encourage the AI to predict the *same* thing repeatedly. This dramatically increases the chances that the prediction is correct and accepted, without sacrificing the quality of the final image or video. It only requires a tiny change to the original SJD code.

Why it matters?

This work is important because it makes autoregressive models much faster and more practical. The new method speeds up image generation by up to 4.2 times and video generation by up to 13.3 times, meaning we can create high-quality visuals much more efficiently. This could have a big impact on fields like art, design, and entertainment where generating realistic images and videos is crucial.

Abstract

While autoregressive (AR) modeling has recently emerged as a new paradigm in visual generation, its practical adoption is severely constrained by the slow inference speed of per-token generation, which often requires thousands of steps to produce a single sample. To address this challenge, we propose MC-SJD, a training-free, lossless parallel decoding framework designed to accelerate AR visual generation by extending the recently introduced Speculative Jacobi Decoding (SJD). Although SJD shows strong potential for accelerating AR generation, we demonstrate that token instability across iterations significantly reduces the acceptance rate, a limitation that primarily arises from the independent sampling process used during draft token generation. To overcome this, we introduce MC-SJD, an information-theoretic approach based on coupling, which substantially accelerates standard SJD by maximizing the probability of sampling identical draft tokens across consecutive iterations, all while preserving its lossless property. Remarkably, this method requires only a single-line modification to the existing algorithm, yet achieves substantial performance gains, delivering up to a ~4.2x acceleration in image generation and ~13.3x acceleration in video generation compared to standard AR decoding, without any degradation in output quality.

View Paper