Self-Taught Agentic Long Context Understanding
Yufan Zhuang, Xiaodong Yu, Jialian Wu, Ximeng Sun, Ze Wang, Jiang Liu, Yusheng Su, Jingbo Shang, Zicheng Liu, Emad Barsoum
2025-02-25
Summary
This paper talks about AgenticLU, a new way to help AI language models better understand and answer complex questions that require processing a lot of information
What's the problem?
Large language models often struggle with answering questions that need a deep understanding of long, complicated texts. They have trouble figuring out what parts of the text are important and how to use that information to give accurate answers
What's the solution?
The researchers created AgenticLU, which uses a method called Chain-of-Clarifications. This allows the AI to ask itself questions to clarify what it needs to know, and then find the relevant information in the text. They also developed a way to train the AI using this method, so it can learn to do this process quickly and effectively on its own
Why it matters?
This matters because it could make AI assistants much better at handling complex questions and understanding long documents. This could be really useful in fields like education, research, or any situation where people need to quickly get accurate information from large amounts of text
Abstract
Answering complex, long-context questions remains a major challenge for large language models (LLMs) as it requires effective question clarifications and context retrieval. We propose Agentic Long-Context Understanding (AgenticLU), a framework designed to enhance an LLM's understanding of such queries by integrating targeted self-clarification with contextual grounding within an agentic workflow. At the core of AgenticLU is Chain-of-Clarifications (CoC), where models refine their understanding through self-generated clarification questions and corresponding contextual groundings. By scaling inference as a tree search where each node represents a CoC step, we achieve 97.8% answer recall on NarrativeQA with a search depth of up to three and a branching factor of eight. To amortize the high cost of this search process to training, we leverage the preference pairs for each step obtained by the CoC workflow and perform two-stage model finetuning: (1) supervised finetuning to learn effective decomposition strategies, and (2) direct preference optimization to enhance reasoning quality. This enables AgenticLU models to generate clarifications and retrieve relevant context effectively and efficiently in a single inference pass. Extensive experiments across seven long-context tasks demonstrate that AgenticLU significantly outperforms state-of-the-art prompting methods and specialized long-context LLMs, achieving robust multi-hop reasoning while sustaining consistent performance as context length grows.