LightThinker++: From Reasoning Compression to Memory Management
Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang
2026-04-07
Summary
This paper introduces a new technique called LightThinker, and then an improved version called LightThinker++, to make large language models (LLMs) more efficient when they're doing complex thinking tasks.
What's the problem?
Large language models are really good at reasoning and solving problems, but they require a lot of computational power and memory, especially when the problem requires many steps of thought. As the 'thought process' gets longer, it becomes slow and resource-intensive, creating a bottleneck. Simply compressing the intermediate steps can lead to losing important details needed for accurate conclusions.
What's the solution?
The researchers developed LightThinker, which compresses the LLM’s intermediate thoughts into a more manageable form. They then improved it to LightThinker++ by adding a system for the model to actively manage its own 'memory' – deciding what to keep and what to discard during the reasoning process. This involves a special training process to help the model learn how to schedule its memory effectively. Essentially, the model learns to remember only the most important things at each step.
Why it matters?
This work is important because it allows LLMs to tackle more complex and lengthy reasoning tasks without becoming incredibly slow or using excessive resources. The improvements show significant reductions in memory usage and processing time, while also often *improving* accuracy, especially in situations where the model needs to plan and act over many steps, like in a game or simulation. This makes it more practical to use these powerful models in real-world applications.
Abstract
Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compression often struggles with complex reasoning where the irreversible loss of intermediate details can lead to logical bottlenecks. To address this, we evolve the framework into LightThinker++, introducing Explicit Adaptive Memory Management. This paradigm shifts to behavioral-level management by incorporating explicit memory primitives, supported by a specialized trajectory synthesis pipeline to train purposeful memory scheduling. Extensive experiments demonstrate the framework's versatility across three dimensions. (1) LightThinker reduces peak token usage by 70% and inference time by 26% with minimal accuracy loss. (2) In standard reasoning, LightThinker++ slashes peak token usage by 69.9% while yielding a +2.42% accuracy gain under the same context budget for maximum performance. (3) Most notably, in long-horizon agentic tasks, it maintains a stable footprint beyond 80 rounds (a 60%-70% reduction), achieving an average performance gain of 14.8% across different complex scenarios. Overall, our work provides a scalable direction for sustaining deep LLM reasoning over extended horizons with minimal overhead.