Understanding and Predicting Derailment in Toxic Conversations on GitHub
Mia Mohammad Imran, Robert Zita, Rebekah Copeland, Preetha Chatterjee, Rahat Rizvi Rahman, Kostadin Damevski
2025-03-07
Summary
This paper talks about a study that looks at how conversations on GitHub, a platform for software developers, can turn toxic and how to predict when this might happen
What's the problem?
Sometimes, conversations between software developers on GitHub can become negative or toxic, which makes it hard for people to work together and can drive away new contributors. It's difficult to spot when a conversation is about to turn bad before it actually happens
What's the solution?
The researchers collected a bunch of toxic and non-toxic conversations from GitHub and studied them closely. They found certain words, phrases, and patterns that often show up right before a conversation turns toxic. Using this information, they created a smart computer program that can read conversations and predict if they're likely to become toxic. This program uses advanced AI to summarize how the conversation is going and spot early warning signs
Why it matters?
This matters because it could help keep GitHub a friendlier place for developers to work together. By catching potentially toxic conversations early, moderators could step in and prevent things from getting worse. This could make it easier for people from all backgrounds to contribute to software projects without fear of facing negative interactions
Abstract
Software projects thrive on the involvement and contributions of individuals from different backgrounds. However, toxic language and negative interactions can hinder the participation and retention of contributors and alienate newcomers. Proactive moderation strategies aim to prevent toxicity from occurring by addressing conversations that have derailed from their intended purpose. This study aims to understand and predict conversational derailment leading to toxicity on GitHub. To facilitate this research, we curate a novel dataset comprising 202 toxic conversations from GitHub with annotated derailment points, along with 696 non-toxic conversations as a baseline. Based on this dataset, we identify unique characteristics of toxic conversations and derailment points, including linguistic markers such as second-person pronouns, negation terms, and tones of Bitter Frustration and Impatience, as well as patterns in conversational dynamics between project contributors and external participants. Leveraging these empirical observations, we propose a proactive moderation approach to automatically detect and address potentially harmful conversations before escalation. By utilizing modern LLMs, we develop a conversation trajectory summary technique that captures the evolution of discussions and identifies early signs of derailment. Our experiments demonstrate that LLM prompts tailored to provide summaries of GitHub conversations achieve 69% F1-Score in predicting conversational derailment, strongly improving over a set of baseline approaches.