SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Yung-Sung Chuang, Benjamin Cohen-Wang, Shannon Zejiang Shen, Zhaofeng Wu, Hu Xu, Xi Victoria Lin, James Glass, Shang-Wen Li, Wen-tau Yih
2025-02-14
Summary
This paper talks about SelfCite, a new way to make large language models (LLMs) better at citing their sources when they give answers, without needing humans to manually check and correct them.
What's the problem?
When AI language models give long answers, they often don't explain where they got their information from. This makes it hard to trust their responses or check if they're accurate. Usually, improving this requires a lot of human effort to teach the AI how to cite correctly.
What's the solution?
The researchers created SelfCite, which teaches the AI to cite sources on its own. It works by having the AI check its own answers. If a citation is needed, removing the cited text should change the answer. If the citation is good, keeping only the cited text should give the same answer. This method helps the AI learn to give better citations without constant human supervision.
Why it matters?
This matters because it could make AI-generated information more trustworthy and easier to verify. Better citations mean people can check where the AI got its facts, which is crucial for using AI in education, research, or any field where accuracy is important. It also saves time and money by reducing the need for humans to constantly check and correct AI citations.
Abstract
We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. Instead of only relying on costly and labor-intensive annotations, SelfCite leverages a reward signal provided by the LLM itself through context ablation: If a citation is necessary, removing the cited text from the context should prevent the same response; if sufficient, retaining the cited text alone should preserve the same response. This reward can guide the inference-time best-of-N sampling strategy to improve citation quality significantly, as well as be used in preference optimization to directly fine-tune the models for generating better citations. The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark across five long-form question answering tasks.