Can Large Language Models Capture Human Annotator Disagreements?
Jingwei Ni, Yu Fan, Vilém Zouhar, Donya Rooein, Alexander Hoyle, Mrinmaya Sachan, Markus Leippold, Dirk Hovy, Elliott Ash
2025-06-25
Summary
This paper talks about how large language models (LLMs) struggle to predict when human annotators disagree on labeling data, even though the models do well predicting the most common or majority label.
What's the problem?
The problem is that human annotators often disagree on certain tasks because of different opinions or interpretations, but LLMs usually miss these disagreements and just focus on the majority answer, which overlooks important subtleties.
What's the solution?
The researchers tested different reasoning methods on LLMs and found that one method called Reinforcement Learning with Verifiable Rewards (RLVR), which helps models with clear goals, actually makes it harder for LLMs to predict disagreements. Another simpler method called Chain-of-Thought reasoning helped models do better at predicting disagreement patterns.
Why it matters?
This matters because recognizing disagreements among humans is important for understanding complex and subjective tasks better, so if LLMs can’t capture this, they might provide less reliable annotations and miss important details in real-world applications.
Abstract
LLMs struggle to predict human annotation disagreements, contrary to their performance in predicting majority labels, and RLVR-style reasoning exacerbates this issue.