RePo: Language Models with Context Re-Positioning
Huayang Li, Tianyu Zhao, Richard Sproat
2025-12-17
Summary
This paper introduces a new way for large language models (LLMs) to understand the order of information they're given, aiming to make them better at using context to solve problems.
What's the problem?
Current LLMs treat the order of words in a prompt very simply, usually just assigning them numbers in sequence. The researchers argue this is inefficient because it doesn't help the model understand *how* different parts of the context relate to each other, forcing the model to waste processing power figuring out these relationships instead of focusing on the actual task. This is like trying to understand a complex story when the sentences are just listed in a random order – it takes extra effort to make sense of it all.
What's the solution?
The researchers developed a system called RePo that learns a more intelligent way to position words within a prompt. Instead of just using numbers, RePo uses a special module that figures out the best arrangement of words based on how they connect to each other. It's like RePo rearranges the sentences in a story to highlight the most important connections, making it easier to understand. They trained this system on a language model called OLMo-2 and showed it improves performance, especially when the context is messy, structured like a table, or very long.
Why it matters?
This work is important because it could lead to LLMs that are more efficient and better at reasoning. By reducing the mental effort needed to understand context, the model can dedicate more resources to actually solving the problem at hand. This is especially crucial as we try to build LLMs that can handle increasingly complex tasks and larger amounts of information.
Abstract
In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. Drawing on Cognitive Load Theory (CLT), we argue that this uninformative structure increases extraneous cognitive load, consuming finite working memory capacity that should be allocated to deep reasoning and attention allocation. To address this, we propose RePo, a novel mechanism that reduces extraneous load via context re-positioning. Unlike standard approaches, RePo utilizes a differentiable module, f_φ, to assign token positions that capture contextual dependencies, rather than replying on pre-defined integer range. By continually pre-training on the OLMo-2 1B backbone, we demonstrate that RePo significantly enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Detailed analysis reveals that RePo successfully allocate higher attention to distant but relevant information, assign positions in dense and non-linear space, and capture the intrinsic structure of the input context. Our code is available at https://github.com/SakanaAI/repo.