Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Chenghao Fan, Wen Heng, Bo Li, Sichen Liu, Yuxuan Song, Jing Su, Xiaoye Qu, Kai Shen, Wei Wei

2026-01-23

Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Summary

This paper investigates diffusion-based language models for generating code, specifically aiming to improve their performance compared to traditional autoregressive models.

What's the problem?

While diffusion models offer advantages like generating code in blocks and reusing data more effectively, existing diffusion models for code haven't performed as well as autoregressive models when given similar resources and training data, meaning they weren't reaching their full potential.

What's the solution?

The researchers created a new model called Stable-DiffCoder, building upon an existing code model architecture. They focused on improving the training process by using a technique called continual pretraining, carefully warming up the model, and adjusting how noise is added during training to make it more stable and efficient at learning code patterns.

Why it matters?

This work demonstrates that diffusion-based training can actually *surpass* traditional autoregressive training for code generation, achieving better results with the same amount of data and model size. This is important because it opens up possibilities for better code editing, reasoning, and even improving performance in less common programming languages by using data augmentation techniques.

Abstract

Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce Stable-DiffCoder, a block diffusion code model that reuses the Seed-Coder architecture, data, and training pipeline. To enable efficient knowledge learning and stable training, we incorporate a block diffusion continual pretraining (CPT) stage enhanced by a tailored warmup and block-wise clipped noise schedule. Under the same data and architecture, Stable-DiffCoder overall outperforms its AR counterpart on a broad suite of code benchmarks. Moreover, relying only on the CPT and supervised fine-tuning stages, Stable-DiffCoder achieves stronger performance than a wide range of \~8B ARs and DLLMs, demonstrating that diffusion-based training can improve code modeling quality beyond AR training alone. Moreover, diffusion-based any-order modeling improves structured code modeling for editing and reasoning, and through data augmentation, benefits low-resource coding languages.

View Paper