Recursive Language Models

Alex L. Zhang, Tim Kraska, Omar Khattab

2026-01-06

Summary

This paper explores a new way to let large language models, like those powering chatbots, handle really long pieces of text as input.

What's the problem?

Large language models have a limit to how much text they can process at once, called a 'context window'. When you give them something longer than that, they struggle to understand the whole thing and give good answers. Existing methods to deal with long texts often don't work very well or are expensive.

What's the solution?

The researchers came up with a technique called Recursive Language Models, or RLMs. Imagine the LLM doesn't try to read the entire long text all at once. Instead, it breaks it down into smaller chunks, examines those chunks, and then essentially 'calls itself' repeatedly on different parts of the text. It's like the LLM is programming itself to analyze the long input step-by-step, using its own abilities to figure out what's important and how the pieces connect.

Why it matters?

This is important because it allows LLMs to work with much longer documents – up to 100 times more text than they could before! Not only that, but RLMs actually give *better* answers than current methods, and can even be cheaper to use. This opens up possibilities for using LLMs with things like entire books, research papers, or long conversations, making them much more useful in real-world applications.

Abstract

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.

View Paper