Thus Spake Long-Context Large Language Model
Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu
2025-02-25
Summary
This paper talks about the development and challenges of long-context Large Language Models (LLMs), which are AI systems that can understand and process very large amounts of text at once.
What's the problem?
While giving AI the ability to work with long texts (like entire books) could make them much more powerful and human-like, it's really hard to do. Current AI models struggle to effectively use all the information in very long texts, often forgetting or misusing parts of what they've read.
What's the solution?
The researchers looked at all the different ways people are trying to make LLMs better at handling long texts. They studied four main areas: how the AI is built (architecture), what kind of computers it runs on (infrastructure), how it's taught (training), and how it's tested (evaluation). They compared this process to a famous musical piece about human growth, showing how AI is trying to overcome its limitations.
Why it matters?
This matters because if we can make AI that truly understands long texts, it could revolutionize how we interact with information. These AIs could potentially read and understand entire books or databases at once, making them incredibly useful for research, education, and problem-solving. The study helps organize all the current research on this topic, which could speed up progress and help researchers focus on the most promising ideas.
Abstract
Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLMs) giving LLMs the lifelong learning potential akin to humans. Unfortunately, the pursuit of a long context is accompanied by numerous obstacles. Nevertheless, long context remains a core competitive advantage for LLMs. In the past two years, the context length of LLMs has achieved a breakthrough extension to millions of tokens. Moreover, the research on long-context LLMs has expanded from length extrapolation to a comprehensive focus on architecture, infrastructure, training, and evaluation technologies. Inspired by the symphonic poem, Thus Spake Zarathustra, we draw an analogy between the journey of extending the context of LLM and the attempts of humans to transcend its mortality. In this survey, We will illustrate how LLM struggles between the tremendous need for a longer context and its equal need to accept the fact that it is ultimately finite. To achieve this, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we will present 10 unanswered questions currently faced by long-context LLMs. We hope this survey can serve as a systematic introduction to the research on long-context LLMs.