DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning
Daniil Ignatev, Nan Li, Hugh Mee Wong, Anh Dang, Shane Kaszefski Yaschuk
2025-09-15
Summary
This paper details the DeMeVa team's work on a challenge called LeWiDi 2025, which focuses on dealing with disagreements between people labeling data. They investigated two main approaches to predict how different people would label the same information.
What's the problem?
The core issue is that when multiple people label data, they often disagree. This disagreement makes it hard for computer models to learn accurately because they're getting conflicting information. The challenge specifically asks participants to predict *who* would label something a certain way, acknowledging that labels aren't always objective 'right' or 'wrong' but depend on the person's perspective.
What's the solution?
The team tried two main strategies. First, they used large language models (like really smart AI text generators) and experimented with different ways of giving them examples to learn from. Second, they used a different technique called label distribution learning, which focuses on understanding the probabilities of different labels, and fine-tuned a model called RoBERTa to predict these probabilities. They found that both approaches could predict individual labelers' preferences and that combining the predictions from the language models into 'soft labels' (probabilities instead of hard choices) worked well.
Why it matters?
This work is important because it shows promising ways to handle disagreements in data labeling. By predicting how individual people would label things, and using those predictions to create more nuanced 'soft labels', we can build more accurate and robust machine learning models, especially in situations where subjective opinions play a big role. It suggests that focusing on *who* is labeling, not just *what* is being labeled, is a valuable direction for future research.
Abstract
This system paper presents the DeMeVa team's approaches to the third edition of the Learning with Disagreements shared task (LeWiDi 2025; Leonardelli et al., 2025). We explore two directions: in-context learning (ICL) with large language models, where we compare example sampling strategies; and label distribution learning (LDL) methods with RoBERTa (Liu et al., 2019b), where we evaluate several fine-tuning methods. Our contributions are twofold: (1) we show that ICL can effectively predict annotator-specific annotations (perspectivist annotations), and that aggregating these predictions into soft labels yields competitive performance; and (2) we argue that LDL methods are promising for soft label predictions and merit further exploration by the perspectivist community.