Hatevolution: What Static Benchmarks Don't Tell Us
Chiara Di Bonaventura, Barbara McGillivray, Yulan He, Albert Meroño-Peñuela
2025-06-17
Summary
This paper talks about how language models are tested for their ability to recognize and handle hate speech, and how these tests can change over time as hate speech evolves.
What's the problem?
The problem is that many current tests for checking how well language models deal with hate speech do not consider that hate speech changes over time. This means models that seemed good at detecting hate speech before might not work as well on new types of harmful language.
What's the solution?
The researchers studied the performance of language models on hate speech detection benchmarks that change over time. They found that the models' ability to detect hate speech does not always stay strong because the language keeps evolving. They emphasize the need to include time-based checks to better evaluate how models perform in real, changing situations.
Why it matters?
This matters because language used to spread hate changes fast, so testing AI models on old data is not enough. To keep AI useful and safe, we need ways to check how well they handle new and changing hate speech, which helps prevent harmful content from spreading online.
Abstract
Empirical evaluation reveals temporal misalignment in the robustness of language models on evolving hate speech benchmarks, highlighting the need for time-sensitive linguistic assessments.