CoDA: Coding LM via Diffusion Adaptation

Haolin Chen, Shiyu Wang, Can Qin, Bo Pang, Zuxin Liu, Jielin Qiu, Jianguo Zhang, Yingbo Zhou, Zeyuan Chen, Ran Xu, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang, Weiran Yao

2025-10-08

CoDA: Coding LM via Diffusion Adaptation

Summary

This paper introduces CoDA, a new and relatively small (1.7 billion parameters) computer program generator based on a type of artificial intelligence called a diffusion model. It's designed to write code, and the researchers are making all the tools and the model itself freely available to others.

What's the problem?

Existing AI models that can generate code are often very large and require a lot of computing power, making them impractical for many users. While diffusion models have the potential to be better at understanding context and filling in missing parts of code, they haven't been successfully made into lightweight, usable systems. Essentially, good code-generating AI was either too big or didn't perform well enough.

What's the solution?

The researchers created CoDA by first training it on a massive amount of code using a special technique called diffusion pre-training. Then, they further refined it with more focused training specifically for coding tasks and to follow instructions. This process, combined with a clever sampling method, allows CoDA to generate code quickly and efficiently, even though it's much smaller than other similar models. They also provide all the code and instructions needed to train it yourself.

Why it matters?

CoDA demonstrates that powerful code generation doesn't necessarily require enormous models. By releasing the model and training pipeline, the researchers hope to encourage more research into creating accessible and efficient AI coding assistants, potentially making AI-powered programming tools available to a wider audience and accelerating software development.

Abstract

Diffusion language models promise bidirectional context and infilling capabilities that autoregressive coders lack, yet practical systems remain heavyweight. We introduce CoDA, a 1.7B-parameter diffusion coder trained on TPU with a fully open-source training pipeline. CoDA pairs large-scale diffusion pre-training with code-centric mid-training and instruction tuning, enabling confidence-guided sampling that keeps inference latency competitive. On Humaneval, MBPP, and EvalPlus, CoDA-1.7B-Instruct matches or surpasses diffusion models up to 7B parameters. Our release includes model checkpoints, evaluation harnesses, and TPU training pipelines to accelerate research on lightweight diffusion-based coding assistants.

View Paper