Toxicity Ahead: Forecasting Conversational Derailment on GitHub

Mia Mohammad Imran, Robert Zita, Rahat Rizvi Rahman, Preetha Chatterjee, Kostadin Damevski

2025-12-24

Toxicity Ahead: Forecasting Conversational Derailment on GitHub

Summary

This paper focuses on the problem of toxic behavior in online open-source software communities and proposes a way to predict when conversations are about to become harmful, allowing for earlier intervention.

What's the problem?

Online communities, especially those building open-source software, can be plagued by toxic interactions which discourage people from contributing and can even cause projects to fail. Currently, identifying and addressing this toxicity relies heavily on manual moderation, which is time-consuming and doesn't scale well. There's a need for a way to automatically detect when a conversation is heading towards a toxic outcome *before* it actually happens.

What's the solution?

The researchers created a dataset of both toxic and non-toxic conversations from GitHub discussions. They then used this data to train a system based on large language models (LLMs) – the same technology powering chatbots – to predict when a conversation will 'derail' into toxicity. Their method works in two steps: first, the LLM summarizes the conversation's dynamics, and then, based on that summary, it estimates the likelihood of the conversation becoming toxic. They used a specific prompting technique called 'Least-to-Most' to help the LLM understand the conversation better, and tested it on different LLMs like Qwen and Llama, achieving high accuracy in predicting toxic derailments.

Why it matters?

This research is important because it offers a potential solution for proactively managing toxicity in online communities. By automatically identifying potentially harmful conversations, moderators can intervene earlier, creating a more welcoming and productive environment for everyone involved in open-source software development. This could lead to more sustainable projects and increased participation from a wider range of contributors.

Abstract

Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability. Preventing such toxicity before it emerges requires a clear understanding of how harmful conversations unfold. However, most proactive moderation strategies are manual, requiring significant time and effort from community maintainers. To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions. Our analysis reveals that toxicity can be forecast by tension triggers, sentiment shifts, and specific conversational patterns. We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline. First, we generate Summaries of Conversation Dynamics (SCDs) via Least-to-Most (LtM) prompting; then we use these summaries to estimate the likelihood of derailment. Evaluated on Qwen and Llama models, our LtM strategy achieves F1-scores of 0.901 and 0.852, respectively, at a decision threshold of 0.3, outperforming established NLP baselines on conversation derailment. External validation on a dataset of 308 GitHub issue threads (65 toxic, 243 non-toxic) yields an F1-score up to 0.797. Our findings demonstrate the effectiveness of structured LLM prompting for early detection of conversational derailment in OSS, enabling proactive and explainable moderation.

View Paper