On the Role of Discreteness in Diffusion LLMs

Ziqi Jin, Bin Wang, Xiang Lin, Lidong Bing, Aixin Sun

2026-01-02

On the Role of Discreteness in Diffusion LLMs

Summary

This paper examines diffusion models, a type of AI that's good at creating things like images, and how well they work for generating text. It points out that while diffusion models have some advantages for language, the way text is structured makes it tricky to apply them directly.

What's the problem?

The main issue is that existing diffusion models for text don't fully capture the way language actually works. They either treat text as continuous data (like sounds) or as individual pieces (like words), but both approaches have drawbacks. Specifically, they don't account for how important different parts of a sentence are, and they struggle to understand relationships between multiple words when generating text quickly. Current models often corrupt information equally across a sentence, which isn't how language functions, and they train on individual tokens instead of considering how words work together.

What's the solution?

The researchers analyzed current large diffusion language models and identified these key weaknesses. They categorized existing methods into two types: those that work with continuous representations of words and those that work directly with the words themselves. They then showed that each type only addresses some of the necessary qualities for good language modeling, creating a trade-off. By highlighting these problems, they suggest that future models should be designed to better reflect the inherent structure of language.

Why it matters?

This work is important because it helps guide the development of better AI models for generating text. By understanding the limitations of current approaches, researchers can create models that produce more coherent, natural-sounding, and meaningful text. This could improve things like chatbots, writing assistants, and machine translation.

Abstract

Diffusion models offer appealing properties for language generation, such as parallel decoding and iterative refinement, but the discrete and highly structured nature of text challenges the direct application of diffusion principles. In this paper, we revisit diffusion language modeling from the view of diffusion process and language modeling, and outline five properties that separate diffusion mechanics from language-specific requirements. We first categorize existing approaches into continuous diffusion in embedding space and discrete diffusion over tokens. We then show that each satisfies only part of the five essential properties and therefore reflects a structural trade-off. Through analyses of recent large diffusion language models, we identify two central issues: (i) uniform corruption does not respect how information is distributed across positions, and (ii) token-wise marginal training cannot capture multi-token dependencies during parallel decoding. These observations motivate diffusion processes that align more closely with the structure of text, and encourage future work toward more coherent diffusion language models.

View Paper