< Explain other AI papers

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Rulin Shao, Akari Asai, Shannon Zejiang Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Samuel G. Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldaini, Faeze Brahman, Wen-tau Yih, Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hannaneh Hajishirzi, Pang Wei Koh

2025-11-25

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Summary

This paper introduces a new method for training AI models to do in-depth research and write long, detailed answers, and then uses this method to create a powerful, openly available research model called DR Tulu-8B.

What's the problem?

Current AI models are good at answering simple questions where the answer can be easily checked for correctness. However, they struggle with complex research tasks that require multiple steps and producing long-form answers because it's hard to give them feedback on something that isn't a simple right or wrong answer. Existing training methods don't really prepare them for these realistic, complicated tasks.

What's the solution?

The researchers developed a new training technique called Reinforcement Learning with Evolving Rubrics (RLER). Essentially, they created a system where the standards for evaluating the AI’s answers (the 'rubrics') aren't fixed, but instead change and improve *along with* the AI model as it learns. This allows the AI to get better feedback as it explores new information and improves its research abilities. They then used this technique to build DR Tulu-8B, specifically designed for long-form research.

Why it matters?

This work is important because it creates the first openly available AI model that can perform complex, long-form research at a level comparable to, or even better than, existing proprietary (closed-source) systems. Because it's open source, other researchers can build upon this work, and it’s also more affordable to use than many of the alternatives. The release of the data, model, and code will help accelerate progress in the field of AI-powered research.

Abstract

Deep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks via reinforcement learning with verifiable rewards (RLVR), which does not extend to realistic long-form tasks. We address this with Reinforcement Learning with Evolving Rubrics (RLER), in which we construct and maintain rubrics that co-evolve with the policy model during training; this allows the rubrics to incorporate information that the model has newly explored and to provide discriminative, on-policy feedback. Using RLER, we develop Deep Research Tulu (DR Tulu-8B), the first open model that is directly trained for open-ended, long-form deep research. Across four long-form deep research benchmarks in science, healthcare and general domains, DR Tulu substantially outperforms existing open deep research models, and matches or exceeds proprietary deep research systems, while being significantly smaller and cheaper per query. To facilitate future research, we release all data, models, and code, including our new MCP-based agent infrastructure for deep research systems.