Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
Linhao Zhong, Linyu Wu, Bozhen Fang, Tianjian Feng, Chenchen Jing, Wen Wang, Jiaheng Zhang, Hao Chen, Chunhua Shen
2026-01-13
Summary
This paper introduces a new way to build diffusion language models, which are a type of AI that generates text. It focuses on making these models better at creating text by allowing them to refine their work step-by-step, rather than making firm decisions too early in the process.
What's the problem?
Current diffusion language models often make quick, all-or-nothing choices about words, using a system of 'masking' where parts of the text are hidden and then filled in. This makes it hard for the model to go back and revise earlier choices, and it doesn't fully use the probabilities the model calculates for different words. Essentially, they're not flexible enough in how they build text.
What's the solution?
The researchers developed a model called EvoToken-DLM. Instead of abruptly masking and unmasking words, EvoToken-DLM uses 'soft tokens' – think of them as blurry probabilities for different words that gradually become clearer. This allows the model to continuously refine its predictions and revisit earlier parts of the text. They also used a special training method called 'continuous trajectory supervision' to help the model learn this gradual refinement process effectively.
Why it matters?
This research is important because it improves the performance of diffusion language models, allowing them to generate higher-quality text. By enabling revisable decoding, the model can create more nuanced and accurate outputs, outperforming other similar models currently available. This could lead to better AI writing tools and more sophisticated text generation capabilities.
Abstract
Diffusion Language Models (DLMs) offer a promising alternative for language modeling by enabling parallel decoding through iterative refinement. However, most DLMs rely on hard binary masking and discrete token assignments, which hinder the revision of early decisions and underutilize intermediate probabilistic representations. In this paper, we propose EvoToken-DLM, a novel diffusion-based language modeling approach that replaces hard binary masks with evolving soft token distributions. EvoToken-DLM enables a progressive transition from masked states to discrete outputs, supporting revisable decoding. To effectively support this evolution, we introduce continuous trajectory supervision, which aligns training objectives with iterative probabilistic updates. Extensive experiments across multiple benchmarks show that EvoToken-DLM consistently achieves superior performance, outperforming strong diffusion-based and masked DLM baselines. Project webpage: https://aim-uofa.github.io/EvoTokenDLM.