AttnTrace: Attention-based Context Traceback for Long-Context LLMs
Yanting Wang, Runpeng Geng, Ying Chen, Jinyuan Jia
2025-08-06
Summary
This paper talks about AttnTrace, a new method that uses the attention information inside large language models to track which parts of a long input were most important for creating the output.
What's the problem?
The problem is that when language models work with very long input texts, it’s difficult and slow to figure out exactly which pieces of the input influenced the output, especially when trying to detect harmful or injected instructions.
What's the solution?
AttnTrace improves this by using attention weights—the info that shows how much focus the model gives to different parts of the input—to efficiently trace back and highlight the most influential parts that led to the output, making the process faster and more accurate.
Why it matters?
This matters because it helps make large language models safer and easier to understand by quickly finding and explaining the sources behind their generated responses, especially in detecting problems like prompt injection attacks.
Abstract
AttnTrace, a new context traceback method using attention weights, improves accuracy and efficiency in detecting prompt injection in long-context large language models.