Artificial Hippocampus Networks for Efficient Long-Context Modeling

Yunhao Fang, Weihao Yu, Shu Zhong, Qinghao Ye, Xuehan Xiong, Lai Wei

2025-10-09

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Summary

This paper introduces a new way to handle very long pieces of text when using AI models, aiming to combine the strengths of different approaches to make processing more efficient and accurate.

What's the problem?

When dealing with long texts, AI models face a challenge: models like RNNs are fast but can 'forget' information from earlier in the text, while models like Transformers remember everything but require a lot of computing power and memory. Essentially, it's hard to be both efficient *and* keep track of all the details in a long sequence.

What's the solution?

The researchers were inspired by how human memory works, specifically the idea of short-term and long-term memory. They created a system where the model keeps a recent 'window' of the text in a detailed, accurate memory (like a Transformer). Information outside that window is then summarized and compressed into a smaller, long-term memory using a new component called the Artificial Hippocampus Network (AHN). This AHN uses techniques similar to modern RNNs to efficiently store and recall important information from the past. They tested different types of AHNs, like Mamba2, DeltaNet, and Gated DeltaNet.

Why it matters?

This work is important because it allows AI models to process much longer texts without needing massive amounts of computing power or memory. The results show that models with this new memory system perform as well as, or even better than, traditional methods while being significantly faster and using less memory. For example, they improved the performance of one model on a long-text task while reducing the computational effort by over 40% and memory usage by 74%.

Abstract

Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of artificial neural networks. Our method maintains a sliding window of the Transformer's KV cache as lossless short-term memory, while a learnable module termed Artificial Hippocampus Network (AHN) recurrently compresses out-of-window information into a fixed-size compact long-term memory. To validate this framework, we instantiate AHNs using modern RNN-like architectures, including Mamba2, DeltaNet, and Gated DeltaNet. Extensive experiments on long-context benchmarks LV-Eval and InfiniteBench demonstrate that AHN-augmented models consistently outperform sliding window baselines and achieve performance comparable or even superior to full-attention models, while substantially reducing computational and memory requirements. For instance, augmenting the Qwen2.5-3B-Instruct with AHNs reduces inference FLOPs by 40.5% and memory cache by 74.0%, while improving its average score on LV-Eval (128k sequence length) from 4.41 to 5.88. Code is available at: https://github.com/ByteDance-Seed/AHN.

View Paper