MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Yu Ying Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, Yash Maurya, Christina Q Knight, Harry R. Lloyd, Florence Bacus, Mantas Mazeika, Bing Liu, Yejin Choi, Mitchell L Gordon, Sydney Levine
2025-10-21
Summary
This paper investigates how well AI systems can reason through moral dilemmas, focusing not just on *what* decisions they make, but *how* they arrive at those decisions.
What's the problem?
As we increasingly rely on AI for decision-making, it's crucial to understand their reasoning process, especially when dealing with complex issues like morality. Unlike subjects like math where there's a single right answer, moral dilemmas have multiple justifiable solutions. Existing ways to test AI reasoning, like benchmarks for math or coding, don't effectively evaluate how AI handles these nuanced moral situations and whether their reasoning aligns with human values. There was a lack of a good way to specifically test and measure AI's ability to think through moral problems in a detailed and comprehensive way.
What's the solution?
The researchers created two new benchmarks called MoReBench and MoReBench-Theory. MoReBench consists of 1,000 different moral scenarios, each with a detailed set of criteria – over 23,000 in total – outlining what a good moral reasoning process should include, like identifying important considerations and weighing different options. MoReBench-Theory focuses on testing if AI can apply established ethical frameworks when reasoning. They then tested current AI models using these benchmarks to see how well they performed.
Why it matters?
This work is important because it provides a better way to evaluate AI's reasoning abilities in morally complex situations. The results show that simply making AI models bigger doesn't automatically make them better at moral reasoning, and that they can exhibit biases towards certain ethical viewpoints. By developing these benchmarks, the researchers are pushing for safer and more transparent AI systems that can make decisions aligned with human values and explain *why* they made those decisions.
Abstract
As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.