Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

Rishiraj Acharya

2025-09-03

Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

Summary

This paper introduces a new way to process sequences of data, like sentences, that aims to be faster and more efficient than the current standard method, called the Transformer.

What's the problem?

The Transformer is really good at understanding sequences, but it gets incredibly slow when dealing with very long sequences. This is because the way it works requires comparing every part of the sequence to every other part, and the amount of work increases dramatically as the sequence gets longer – specifically, it increases with the square of the sequence length. This makes it hard to use Transformers for tasks that need to consider a lot of context.

What's the solution?

The researchers created a new architecture called Gated Associative Memory, or GAM. Instead of comparing everything to everything, GAM uses two main approaches that work in parallel. One approach quickly looks at nearby parts of the sequence to understand local relationships. The other approach focuses on remembering and retrieving important patterns from across the entire sequence. A 'gate' then decides how much to focus on the local information versus the global patterns for each part of the sequence. This design allows GAM to process sequences much faster, in a time that increases linearly with the sequence length, instead of quadratically.

Why it matters?

GAM offers a significant improvement in speed and efficiency compared to Transformers and other recent fast sequence models, like Mamba. The experiments show it's faster to train and performs just as well, or even better, at understanding the data. This means GAM could unlock the ability to process much longer sequences, leading to better performance in tasks like understanding long documents, complex stories, or extended conversations.

Abstract

The Transformer architecture, underpinned by the self-attention mechanism, has become the de facto standard for sequence modeling tasks. However, its core computational primitive scales quadratically with sequence length (O(N^2)), creating a significant bottleneck for processing long contexts. In this paper, we propose the Gated Associative Memory (GAM) network, a novel, fully parallel architecture for sequence modeling that exhibits linear complexity (O(N)) with respect to sequence length. The GAM block replaces the self-attention layer with two parallel pathways: a causal convolution to efficiently capture local, position-dependent context, and a parallel associative memory retrieval mechanism to model global, content-based patterns. These pathways are dynamically fused using a gating mechanism, allowing the model to flexibly combine local and global information for each token. We implement GAM from scratch and conduct a rigorous comparative analysis against a standard Transformer model and a modern linear-time baseline (Mamba) on the WikiText-2 benchmark, as well as against the Transformer on the TinyStories dataset. Our experiments demonstrate that GAM is consistently faster, outperforming both baselines on training speed, and achieves a superior or competitive final validation perplexity across all datasets, establishing it as a promising and efficient alternative for sequence modeling.

View Paper