OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
Tianci Liu, Ran Xu, Tony Yu, Ilgee Hong, Carl Yang, Tuo Zhao, Haoyu Wang
2025-10-10
Summary
This paper focuses on improving how we teach AI models, specifically large language models, to align with what humans actually want. It introduces a new way to use detailed guidelines, called rubrics, to give the AI more nuanced feedback than simple 'good' or 'bad' ratings.
What's the problem?
Currently, training AI with human feedback often relies on simple judgments like ranking two responses against each other or giving a single score. This doesn't really capture all the different things humans consider when evaluating something, like creativity, accuracy, or helpfulness. While using detailed rubrics – sets of criteria for judging quality – is a good idea, creating these rubrics is hard work and doesn't scale well to many different tasks.
What's the solution?
The researchers created a large collection of prompts paired with rubrics, called OpenRubrics. They developed a technique called Contrastive Rubric Generation where the AI learns to create rubrics by comparing good and bad responses to the same prompt, identifying both specific rules and general principles. They also improved the quality of the rubrics by filtering out inconsistent or unreliable ones. This system, called Rubric-RM, then uses these rubrics to evaluate AI-generated text.
Why it matters?
This work is important because it offers a more scalable way to align AI with human preferences. Rubrics provide richer feedback than simple scores, and OpenRubrics makes it easier to generate these rubrics automatically. This means we can get closer to having AI that consistently produces outputs that humans find truly helpful and desirable, without relying on expensive and time-consuming human evaluation for every single response.
Abstract
Reward modeling lies at the core of reinforcement learning from human feedback (RLHF), yet most existing reward models rely on scalar or pairwise judgments that fail to capture the multifaceted nature of human preferences. Recent studies have explored rubrics-as-rewards (RaR) that uses structured natural language criteria that capture multiple dimensions of response quality. However, producing rubrics that are both reliable and scalable remains a key challenge. In this work, we introduce OpenRubrics, a diverse, large-scale collection of (prompt, rubric) pairs for training rubric-generation and rubric-based reward models. To elicit discriminative and comprehensive evaluation signals, we introduce Contrastive Rubric Generation (CRG), which derives both hard rules (explicit constraints) and principles (implicit qualities) by contrasting preferred and rejected responses. We further improve reliability by enforcing preference-label consistency via rejection sampling to remove noisy rubrics. Across multiple reward-modeling benchmarks, our rubric-based reward model, Rubric-RM, surpasses strong size-matched baselines by 6.8%. These gains transfer to policy models on instruction-following and biomedical benchmarks. Our results show that rubrics provide scalable alignment signals that narrow the gap between costly human evaluation and automated reward modeling, enabling a new principle-driven paradigm for LLM alignment.