DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

Jingyi Yang, Yuxian Jiang, Xuhao Hu, Shuang Cheng, Biqing Qi, Jing Shao

2026-04-08

DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

Summary

This paper introduces DARE, a new open-source framework designed to make it easier to work with and improve diffusion large language models (dLLMs). These models are a newer type of AI that generates text differently than the more common autoregressive models, but currently lack standardized tools for development and testing.

What's the problem?

Currently, developing and comparing dLLMs is difficult because the tools and code used to refine these models after their initial training are scattered and specific to each research paper. This means researchers have to write a lot of custom code just to reproduce results or try out new ideas, slowing down progress and making it hard to fairly compare different approaches. It's like everyone building with different LEGO sets that don't connect!

What's the solution?

The authors created DARE, which stands for dLLMs Alignment and Reinforcement Executor. It's a unified system that provides all the necessary tools – like fine-tuning, preference optimization, and reinforcement learning – in one place. DARE works with several different dLLM families, making it a versatile platform for experimentation and evaluation. It builds upon existing tools like verl~sheng2024hybridflow and OpenCompass to provide a complete solution.

Why it matters?

DARE is important because it provides a common foundation for dLLM research. By standardizing the post-training process, it allows researchers to more easily build upon each other's work, reproduce results, and develop new and improved dLLMs. This will ultimately speed up the advancement of this promising new type of AI and make it more accessible to the wider research community.

Abstract

Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their open-source ecosystem remains fragmented across model families and, in particular, across post-training pipelines, where reinforcement learning objectives, rollout implementations and evaluation scripts are often released as paper-specific codebases. This fragmentation slows research iteration, raises the engineering burden of reproduction, and makes fair comparison across algorithms difficult. We present DARE (dLLMs Alignment and Reinforcement Executor), an open framework for post-training and evaluating dLLMs. Built on top of verl~sheng2024hybridflow and OpenCompass~2023opencompass, DARE unifies supervised fine-tuning, parameter-efficient fine-tuning, preference optimization, and dLLM-specific reinforcement learning under a shared execution stack for both masked and block diffusion language models. Across representative model families including LLaDA, Dream, SDAR, and LLaDA2.x, DARE provides broad algorithmic coverage, reproducible benchmark evaluation, and practical acceleration. Extensive empirical results position that DARE serves as a reusable research substrate for developing, comparing, and deploying post-training methods for current and emerging dLLMs.

View Paper