Associative Recurrent Memory Transformer

Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev

2024-07-09

Summary

This paper talks about the Associative Recurrent Memory Transformer (ARMT), a new type of neural network designed to handle very long sequences of information efficiently, allowing it to process new data quickly without slowing down.

What's the problem?

The main problem is that traditional neural networks struggle with very long sequences of data, like long texts or large datasets, because they take too much time to process each new piece of information. This makes it hard for them to work effectively in tasks that require understanding or remembering a lot of information at once.

What's the solution?

To solve this issue, the authors developed ARMT, which combines two key techniques: transformer self-attention for focusing on important parts of the input and segment-level recurrence for storing specific information over long contexts. This means that ARMT can keep track of relevant details while processing new information quickly. The researchers tested ARMT on a challenging benchmark called BABILong, which involves answering questions based on very large amounts of text (up to 50 million tokens), and found that it achieved an impressive accuracy of 79.9%.

Why it matters?

This research is important because it improves how AI systems can understand and remember large amounts of information. By making it easier for models to process long sequences quickly and accurately, ARMT could enhance various applications, such as natural language processing, data analysis, and any field that requires handling extensive datasets efficiently.

Abstract

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We demonstrate that ARMT outperfors existing alternatives in associative retrieval tasks and sets a new performance record in the recent BABILong multi-task long-context benchmark by answering single-fact questions over 50 million tokens with an accuracy of 79.9%. The source code for training and evaluation is available on github.

View Paper