SimpleMem: Efficient Lifelong Memory for LLM Agents
Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, Huaxiu Yao
2026-01-06
Summary
This paper introduces a new way for AI agents, powered by large language models, to remember past interactions and use that memory effectively over long periods of time.
What's the problem?
Current methods for giving AI agents memory have drawbacks. Simply saving everything the agent experiences takes up a lot of space and processing power because it's repetitive. Other methods try to filter out unimportant information, but that filtering process itself requires a lot of computational effort and can be expensive in terms of the 'tokens' – the basic units of text – the AI needs to process.
What's the solution?
The researchers developed a system called SimpleMem that compresses memories in a smart way. It works in three steps: first, it filters and organizes past interactions into concise 'memory units' with multiple perspectives. Second, it combines related memory units into more abstract, higher-level summaries to remove redundancy. Finally, it retrieves only the most relevant memories based on the current question or task, adjusting how much memory it looks through depending on how complex the task is.
Why it matters?
This research is important because it makes AI agents more efficient and capable of handling complex, ongoing tasks. SimpleMem improves accuracy, speeds up processing, and significantly reduces the amount of information the AI needs to consider, making it a more practical solution for real-world applications where long-term memory is crucial.
Abstract
To support reliable long-term interaction in complex environments, LLM agents require memory systems that efficiently manage historical experiences. Existing approaches either retain full interaction histories via passive context extension, leading to substantial redundancy, or rely on iterative reasoning to filter noise, incurring high token costs. To address this challenge, we introduce SimpleMem, an efficient memory framework based on semantic lossless compression. We propose a three-stage pipeline designed to maximize information density and token utilization: (1) Semantic Structured Compression, which applies entropy-aware filtering to distill unstructured interactions into compact, multi-view indexed memory units; (2) Recursive Memory Consolidation, an asynchronous process that integrates related units into higher-level abstract representations to reduce redundancy; and (3) Adaptive Query-Aware Retrieval, which dynamically adjusts retrieval scope based on query complexity to construct precise context efficiently. Experiments on benchmark datasets show that our method consistently outperforms baseline approaches in accuracy, retrieval efficiency, and inference cost, achieving an average F1 improvement of 26.4% while reducing inference-time token consumption by up to 30-fold, demonstrating a superior balance between performance and efficiency. Code is available at https://github.com/aiming-lab/SimpleMem.