Training AI Co-Scientists Using Rubric Rewards
Shashwat Goel, Rishi Hazra, Dulhan Jayalath, Timon Willi, Parag Jain, William F. Shen, Ilias Leontiadis, Francesco Barbieri, Yoram Bachrach, Jonas Geiping, Chenxi Whitehouse
2025-12-30
Summary
This paper explores how to build better AI assistants for researchers, specifically focusing on their ability to create detailed plans for conducting research projects.
What's the problem?
Currently, AI models aren't very good at creating research plans that fully consider all the necessary requirements and limitations. They often miss important details or suggest plans that aren't practical. Training these models is difficult because getting feedback on research plans usually requires a human expert, which doesn't scale well.
What's the solution?
The researchers developed a method to train AI models using a large collection of existing research papers. They automatically pulled out the goals of these papers and, crucially, the criteria used to *evaluate* those goals – essentially, the grading rubrics. Then, they used a technique called reinforcement learning where the AI learns by 'grading' its own plans against these rubrics. One version of the AI acts as the grader, and the other tries to create plans that meet the grading criteria, all without needing constant human input. They tested this with machine learning and medical research goals.
Why it matters?
This work is important because it shows a way to create more helpful AI research assistants that can assist scientists without needing a lot of human supervision. It’s a step towards AI that can genuinely collaborate with researchers, speeding up the process of discovery and potentially tackling complex problems in fields like medicine where getting direct feedback on experiments is difficult.
Abstract
AI co-scientists are emerging as a tool to assist human researchers in achieving their research goals. A crucial feature of these AI co-scientists is the ability to generate a research plan given a set of aims and constraints. The plan may be used by researchers for brainstorming, or may even be implemented after further refinement. However, language models currently struggle to generate research plans that follow all constraints and implicit requirements. In this work, we study how to leverage the vast corpus of existing research papers to train language models that generate better research plans. We build a scalable, diverse training corpus by automatically extracting research goals and goal-specific grading rubrics from papers across several domains. We then train models for research plan generation via reinforcement learning with self-grading. A frozen copy of the initial policy acts as the grader during training, with the rubrics creating a generator-verifier gap that enables improvements without external human supervision. To validate this approach, we conduct a study with human experts for machine learning research goals, spanning 225 hours. The experts prefer plans generated by our finetuned Qwen3-30B-A3B model over the initial model for 70% of research goals, and approve 84% of the automatically extracted goal-specific grading rubrics. To assess generality, we also extend our approach to research goals from medical papers, and new arXiv preprints, evaluating with a jury of frontier models. Our finetuning yields 12-22% relative improvements and significant cross-domain generalization, proving effective even in problem settings like medical research where execution feedback is infeasible. Together, these findings demonstrate the potential of a scalable, automated training recipe as a step towards improving general AI co-scientists.