ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

Ryuto Koike, Masahiro Kaneko, Ayana Niwa, Preslav Nakov, Naoaki Okazaki

2025-02-18

ExaGPT: Example-Based Machine-Generated Text Detection for Human
Interpretability

Summary

This paper talks about ExaGPT, a new method for detecting whether a piece of text was written by a human or generated by an AI language model. ExaGPT works by comparing parts of the text to examples of human-written and AI-generated text, making it easier for people to understand how it makes its decisions.

What's the problem?

Current methods for detecting AI-generated text aren't very good at explaining their decisions in a way that humans can easily understand. This is important because wrongly identifying a text as AI-generated could have serious consequences, like unfairly accusing a student of cheating.

What's the solution?

The researchers created ExaGPT, which looks at small parts of the text being analyzed and compares them to a collection of known human-written and AI-generated texts. It then shows examples of similar text parts to explain its decision. This approach mimics how humans naturally try to spot AI-generated text, making the results more understandable.

Why it matters?

This matters because as AI-generated text becomes more common, we need reliable ways to tell it apart from human-written text. ExaGPT not only performs better than previous methods, but it also helps people understand and trust its decisions. This could be especially useful in education, journalism, and other fields where knowing the source of a text is crucial.

Abstract

Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions, such as undermining student's academic dignity. LLM text detection thus needs to ensure the interpretability of the decision, which can help users judge how reliably correct its prediction is. When humans verify whether a text is human-written or LLM-generated, they intuitively investigate with which of them it shares more similar spans. However, existing interpretable detectors are not aligned with the human decision-making process and fail to offer evidence that users easily understand. To bridge this gap, we introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process for verifying the origin of a text. ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore. This approach can provide similar span examples that contribute to the decision for each span in the text as evidence. Our human evaluation demonstrates that providing similar span examples contributes more effectively to judging the correctness of the decision than existing interpretable methods. Moreover, extensive experiments in four domains and three generators show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.

View Paper