LM2: Large Memory Models

Jikun Kang, Wenqi Wu, Filippos Christianos, Alex J. Chan, Fraser Greenlee, George Thomas, Marvin Purtorab, Andy Toulis

2025-02-11

Summary

This paper talks about LM2, a new type of AI model that includes a memory system to help it handle long and complex tasks, like reasoning over large amounts of information or solving multi-step problems.

What's the problem?

Regular Transformer-based AI models struggle with tasks that require understanding and processing long pieces of information. They often lose track of important details over time because they don’t have a way to store and retrieve long-term information effectively.

What's the solution?

The researchers created LM2, which adds a memory module to the standard Transformer design. This memory system works like a storage unit that interacts with the model’s inputs, helping it keep track of important details. The memory updates dynamically using mechanisms that decide what to keep, forget, or update. This allows LM2 to perform better on tasks like multi-step reasoning and answering questions based on long texts.

Why it matters?

This matters because it makes AI systems much better at handling complex tasks that involve large amounts of information. LM2 improves performance without sacrificing the general abilities of the model, making it useful for applications like legal research, scientific analysis, or any task requiring deep reasoning over long contexts.

Abstract

This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation repository, interacting with input tokens via cross attention and updating through gating mechanisms. To preserve the Transformers general-purpose capabilities, LM2 maintains the original information flow while integrating a complementary memory pathway. Experimental results on the BABILong benchmark demonstrate that the LM2model outperforms both the memory-augmented RMT model by 37.1% and the baseline Llama-3.2 model by 86.3% on average across tasks. LM2 exhibits exceptional capabilities in multi-hop inference, numerical reasoning, and large-context question-answering. On the MMLU dataset, it achieves a 5.0% improvement over a pre-trained vanilla model, demonstrating that its memory module does not degrade performance on general tasks. Further, in our analysis, we explore the memory interpretability, effectiveness of memory modules, and test-time behavior. Our findings emphasize the importance of explicit memory in enhancing Transformer architectures.

View Paper