Attention Basin: Why Contextual Position Matters in Large Language Models

Zihao Yi, Delong Zeng, Zhenqing Ling, Haohao Luo, Zhe Xu, Wei Liu, Jian Luan, Wanxia Cao, Ying Shen

2025-08-08

Attention Basin: Why Contextual Position Matters in Large Language
Models

Summary

This paper talks about how Large Language Models (LLMs) pay different attention to pieces of information depending on their position in the input, revealing a pattern called the attention basin where the models focus more on the beginning and end of a sequence while ignoring the middle.

What's the problem?

The problem is that LLMs do not treat all parts of the input equally, which can cause important information in the middle to be overlooked, reducing the model's overall performance.

What's the solution?

The solution was to develop a method called Attention-Driven Reranking (AttnRank) that smartly reorders input sequences so that the most important information is placed where the model naturally pays the most attention, improving understanding without changing the model itself.

Why it matters?

This matters because it helps LLMs make better use of the information they receive, which leads to improved results in tasks like answering complex questions or learning from few examples, without needing extra training or complicated adjustments.

Abstract

The performance of Large Language Models (LLMs) is significantly sensitive to the contextual position of information in the input. To investigate the mechanism behind this positional bias, our extensive experiments reveal a consistent phenomenon we term the attention basin: when presented with a sequence of structured items (e.g., retrieved documents or few-shot examples), models systematically assign higher attention to the items at the beginning and end of the sequence, while neglecting those in the middle. Crucially, our analysis further reveals that allocating higher attention to critical information is key to enhancing model performance. Based on these insights, we introduce Attention-Driven Reranking (AttnRank), a two-stage framework that (i) estimates a model's intrinsic positional attention preferences using a small calibration set, and (ii) reorders retrieved documents or few-shot examples to align the most salient content with these high-attention positions. AttnRank is a model-agnostic, training-free, and plug-and-play method with minimal computational overhead. Experiments on multi-hop QA and few-shot in-context learning tasks demonstrate that AttnRank achieves substantial improvements across 10 large language models of varying architectures and scales, without modifying model parameters or training procedures.

View Paper