Machine Text Detectors are Membership Inference Attacks

Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki

2025-10-23

Machine Text Detectors are Membership Inference Attacks

Summary

This paper explores the surprising connection between two seemingly different areas of artificial intelligence: figuring out if someone was used to train an AI model (membership inference attacks) and detecting text written by AI versus a human (machine-generated text detection).

What's the problem?

Both of these tasks – identifying training data and spotting AI-written text – rely on similar clues within how a language model assigns probabilities to words. However, researchers have treated them as separate problems, potentially missing out on improvements and insights that could come from understanding their shared foundation. The core issue is a lack of cross-pollination of ideas between these two fields.

What's the solution?

The researchers proved mathematically that the best way to measure success in *both* tasks is actually the same. They then tested this idea by trying methods developed for one task on the other, using a lot of different AI models and datasets. They found a strong connection: how well a method performs on one task predicts how well it will do on the other. Specifically, a tool originally designed to detect AI-generated text actually worked exceptionally well at identifying training data, showing the potential for transferability. To help future research, they also created a shared toolkit called MINT for evaluating methods in both areas.

Why it matters?

This work shows that these two AI security areas aren't as distinct as we thought. By recognizing their common ground, researchers can develop more effective tools and strategies for both protecting privacy (by detecting if your data was used to train an AI) and ensuring authenticity (by identifying AI-generated content). It encourages collaboration and a more unified approach to these important challenges.

Abstract

Although membership inference attacks (MIAs) and machine-generated text detection target different goals, identifying training samples and synthetic texts, their methods often exploit similar signals based on a language model's probability distribution. Despite this shared methodological foundation, the two tasks have been independently studied, which may lead to conclusions that overlook stronger methods and valuable insights developed in the other task. In this work, we theoretically and empirically investigate the transferability, i.e., how well a method originally developed for one task performs on the other, between MIAs and machine text detection. For our theoretical contribution, we prove that the metric that achieves the asymptotically highest performance on both tasks is the same. We unify a large proportion of the existing literature in the context of this optimal metric and hypothesize that the accuracy with which a given method approximates this metric is directly correlated with its transferability. Our large-scale empirical experiments, including 7 state-of-the-art MIA methods and 5 state-of-the-art machine text detectors across 13 domains and 10 generators, demonstrate very strong rank correlation (rho > 0.6) in cross-task performance. We notably find that Binoculars, originally designed for machine text detection, achieves state-of-the-art performance on MIA benchmarks as well, demonstrating the practical impact of the transferability. Our findings highlight the need for greater cross-task awareness and collaboration between the two research communities. To facilitate cross-task developments and fair evaluations, we introduce MINT, a unified evaluation suite for MIAs and machine-generated text detection, with implementation of 15 recent methods from both tasks.

View Paper