Group Representational Position Encoding

Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao

2025-12-09

Group Representational Position Encoding

Summary

This paper introduces GRAPE, a new way to handle positional encoding in models that deal with sequences like text, aiming to improve how these models understand the order of information.

What's the problem?

When processing sequences, models need to know the position of each element – is it the first word, the second, and so on? Traditional methods for encoding this positional information can struggle with very long sequences, becoming computationally expensive or losing accuracy as the sequence gets longer. Existing methods like RoPE and ALiBi have limitations in how they capture relationships between positions and how efficiently they scale.

What's the solution?

GRAPE provides a flexible framework that combines two main approaches: one uses rotations (like spinning vectors) and the other uses additive biases (like shifting values). It’s designed to be more efficient and adaptable than previous methods. The rotations allow the model to understand relative positions and maintain important properties like the length of the vectors representing words. The additive biases offer a simpler way to represent position, similar to existing methods like ALiBi, but with improvements in how it handles long sequences. GRAPE can actually recreate RoPE and ALiBi as specific cases, showing it’s a more general solution.

Why it matters?

GRAPE is important because it offers a more principled and versatile way to encode positional information, especially for models dealing with very long sequences of data. By providing a unified framework, it allows researchers to explore different positional encoding strategies more easily and potentially build more powerful and efficient models for tasks like natural language processing.

Abstract

We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in SO(d) and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group GL. In Multiplicative GRAPE, a position n in Z (or t in R) acts as G(n)=exp(n,ω,L) with a rank-2 skew generator L in R^{d times d}, yielding a relative, compositional, norm-preserving map with a closed-form matrix exponential. RoPE is recovered exactly when the d/2 planes are the canonical coordinate pairs with log-uniform spectrum. Learned commuting subspaces and compact non-commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at O(d) and O(r d) cost per head, respectively. In Additive GRAPE, additive logits arise as rank-1 (or low-rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long-context models, subsuming RoPE and ALiBi as special cases. Project Page: https://github.com/model-architectures/GRAPE.

View Paper