Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges
Baixiang Huang, Canyu Chen, Kai Shu
2024-08-20

Summary
This paper discusses the challenges and methods of authorship attribution in the context of large language models (LLMs), focusing on how to determine who wrote a piece of text.
What's the problem?
As LLMs become more advanced, it becomes harder to tell whether a text was written by a human or generated by a machine. This blurring of authorship raises concerns about plagiarism, misinformation, and the credibility of digital content, making it crucial to accurately attribute authorship.
What's the solution?
The authors review recent research on authorship attribution and categorize it into four main problems: identifying human-written texts, detecting LLM-generated texts, attributing LLM-generated texts to their creators, and dealing with texts co-authored by humans and LLMs. They also address challenges like ensuring that attribution methods can work across different contexts and providing clear explanations for how decisions are made.
Why it matters?
This research is important because it helps maintain the integrity of written content in a world where AI-generated text is becoming common. By improving authorship attribution methods, we can better combat plagiarism and misinformation, ensuring that original authors receive proper credit for their work.
Abstract
Accurate attribution of authorship is crucial for maintaining the integrity of digital content, improving forensic investigations, and mitigating the risks of misinformation and plagiarism. Addressing the imperative need for proper authorship attribution is essential to uphold the credibility and accountability of authentic authorship. The rapid advancements of Large Language Models (LLMs) have blurred the lines between human and machine authorship, posing significant challenges for traditional methods. We presents a comprehensive literature review that examines the latest research on authorship attribution in the era of LLMs. This survey systematically explores the landscape of this field by categorizing four representative problems: (1) Human-written Text Attribution; (2) LLM-generated Text Detection; (3) LLM-generated Text Attribution; and (4) Human-LLM Co-authored Text Attribution. We also discuss the challenges related to ensuring the generalization and explainability of authorship attribution methods. Generalization requires the ability to generalize across various domains, while explainability emphasizes providing transparent and understandable insights into the decisions made by these models. By evaluating the strengths and limitations of existing methods and benchmarks, we identify key open problems and future research directions in this field. This literature review serves a roadmap for researchers and practitioners interested in understanding the state of the art in this rapidly evolving field. Additional resources and a curated list of papers are available and regularly updated at https://llm-authorship.github.io