ModelCitizens: Representing Community Voices in Online Safety
Ashima Suvarna, Christina Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, Saadia Gabriel
2025-07-10
Summary
This paper talks about ModelCitizens, a new dataset and models designed to better detect toxic language on social media by including diverse community views and different meanings of harmful speech.
What's the problem?
The problem is that current tools for spotting toxic language often miss important details because they don’t consider different community backgrounds or the context around the harmful words, leading to unfair or inaccurate detection.
What's the solution?
The researchers created ModelCitizens with many annotations from different communities to capture a wide range of toxicity meanings. They trained models using this data that understand context better and respect community perspectives, helping them detect toxic language more fairly and accurately.
Why it matters?
This matters because online safety improves when harmful language is detected more reliably and fairly, protecting users while respecting diverse views. It helps make social media a safer and more inclusive place for everyone.
Abstract
MODELCITIZENS, a dataset with diverse toxicity annotations, and community-informed models outperform existing tools in detecting toxic language in social media posts, emphasizing the importance of context and community perspectives.