Compressing Chain-of-Thought in LLMs via Step Entropy
Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu
2025-08-12
Summary
This paper talks about a new way to make large language models faster and more efficient when they use Chain-of-Thought reasoning. Chain-of-Thought is a method where the model explains its thinking step-by-step to solve complex problems. The paper introduces a technique that compresses these step-by-step explanations using something called step entropy and a two-stage training method.
What's the problem?
The problem is that while Chain-of-Thought reasoning helps AI models think better by working through problems in steps, it also makes the models slower and use more computing power because they have to process many steps each time they answer a question.
What's the solution?
The solution is to use step entropy, which is a way to measure how much new information is in each reasoning step, to find and keep only the important parts of the reasoning. Then, the model is trained in two stages to learn how to make shorter but still accurate reasoning chains. This reduces the amount of work the model has to do, making it faster without losing much accuracy.
Why it matters?
This matters because making Chain-of-Thought reasoning more efficient means AI models can solve hard problems faster and use less energy. This can help AI be more practical for everyday use, especially in tasks like math, coding, or complex decision-making, where clear step-by-step thinking is important.
Abstract
A novel CoT compression framework using step entropy and a two-stage training strategy enhances LLM inference efficiency without significantly reducing accuracy.