LightMem: Lightweight and Efficient Memory-Augmented Generation
Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, Ningyu Zhang
2025-10-22
Summary
This paper introduces LightMem, a new way for large language models (LLMs) to remember and use information from past conversations or interactions, making them better at tasks that require keeping track of things over time.
What's the problem?
Large language models are really good at many things, but they often struggle to remember what happened earlier in a conversation or a complex task. Existing methods to give LLMs memory often slow them down significantly, requiring a lot of computing power and time. Essentially, they can't efficiently balance remembering things and responding quickly.
What's the solution?
The researchers created LightMem, a memory system inspired by how human memory works. It has three stages: a quick 'sensory memory' that filters out unimportant information, a 'short-term memory' that organizes important information by topic, and a 'long-term memory' that updates when the system isn't actively working on a task. This design allows LightMem to store and retrieve information efficiently, without slowing down the LLM too much.
Why it matters?
LightMem is important because it makes LLMs more practical for real-world applications that require remembering past interactions. The experiments showed it improves accuracy, uses fewer resources like processing tokens and API calls, and runs much faster than other memory systems, meaning it could lead to more responsive and capable AI assistants and chatbots.
Abstract
Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and utilization mechanisms. However, existing memory systems often introduce substantial time and computational overhead. To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of human memory, LightMem organizes memory into three complementary stages. First, cognition-inspired sensory memory rapidly filters irrelevant information through lightweight compression and groups information according to their topics. Next, topic-aware short-term memory consolidates these topic-based groups, organizing and summarizing content for more structured access. Finally, long-term memory with sleep-time update employs an offline procedure that decouples consolidation from online inference. Experiments on LongMemEval with GPT and Qwen backbones show that LightMem outperforms strong baselines in accuracy (up to 10.9% gains) while reducing token usage by up to 117x, API calls by up to 159x, and runtime by over 12x. The code is available at https://github.com/zjunlp/LightMem.