Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Xiaoran Liu, Yuerong Song, Zhigeng Liu, Zengfeng Huang, Qipeng Guo, Zhaoxiang Liu, Shiguo Lian, Ziwei He, Xipeng Qiu

2025-12-09

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Summary

This paper focuses on improving how Large Language Models (LLMs) understand the order of words in a sentence, specifically when dealing with very long pieces of text.

What's the problem?

LLMs use a technique called RoPE to keep track of word order. RoPE works by rotating words in a mathematical space, but current methods only use part of the information from this rotation. They throw away the 'imaginary' part of the calculation, which potentially loses important details about how words relate to each other, especially when the text is long.

What's the solution?

The researchers developed a new method that puts the 'imaginary' part of the RoPE calculation back into use. Instead of just one number to represent how well two words connect, they use two numbers, capturing more of the positional information. They proved mathematically and through experiments that this approach helps the model better understand long-range relationships between words.

Why it matters?

This is important because it allows LLMs to process and understand longer texts more effectively. As the length of text a model can handle increases, maintaining understanding of word order becomes more challenging. This new method consistently improves performance on tasks that require understanding long contexts, meaning models can generate more coherent and accurate responses when given a lot of information.

Abstract

Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the real component of the complex-valued dot product for attention score calculation. This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-context dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. Our method leverages the full complex-valued representation to create a dual-component attention score. We theoretically and empirically demonstrate that this approach enhances the modeling of long-context dependencies by preserving more positional information. Furthermore, evaluations on a suite of long-context language modeling benchmarks show that our method consistently improves performance over the standard RoPE, with the benefits becoming more significant as context length increases. The code is available at https://github.com/OpenMOSS/rope_pp.

View Paper