CoT-Valve: Length-Compressible Chain-of-Thought Tuning

Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang

2025-02-14

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

Summary

This paper talks about CoT-Valve, a new way to make AI models think more efficiently by adjusting how much they explain their reasoning based on how hard a problem is. It's like teaching a computer to show its work for math problems, but only as much as it needs to.

What's the problem?

When AI models use Chain-of-Thought to explain their reasoning, it helps them think better, but it also makes them use a lot more computer power and time. This is because they often give long explanations even for simple problems, which is unnecessary and wasteful.

What's the solution?

The researchers created CoT-Valve, which teaches AI models to adjust the length of their explanations based on how difficult a task is. They found a way to control this length by tweaking certain parts of the model. They also made special datasets with both long and short explanations for the same questions to train the AI. Using this method, they were able to make the AI give much shorter explanations for simpler problems without losing much accuracy.

Why it matters?

This matters because it can make AI systems much more efficient and cost-effective. By using shorter explanations for easier tasks, AI can work faster and use less computer power, which saves money and energy. This could help make advanced AI more accessible and useful for everyday applications, while still allowing it to tackle complex problems when needed.

Abstract

Chain-of-Thought significantly enhances a model's reasoning capability, but it also comes with a considerable increase in inference costs due to long chains. With the observation that the reasoning path can be easily compressed under easy tasks but struggle on hard tasks, we explore the feasibility of elastically controlling the length of reasoning paths with only one model, thereby reducing the inference overhead of reasoning models dynamically based on task difficulty. We introduce a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths. To achieve this, we propose to identify a direction in the parameter space that, when manipulated, can effectively control the length of generated CoT. Moreover, we show that this property is valuable for compressing the reasoning chain. We construct datasets with chains from long to short for the same questions and explore two enhanced strategies for CoT-Valve: (1) a precise length-compressible CoT tuning method, and (2) a progressive chain length compression approach. Our experiments show that CoT-Valve successfully enables controllability and compressibility of the chain and shows better performance than the prompt-based control. We applied this method to QwQ-32B-Preview, reducing reasoning chains on GSM8K from 741 to 225 tokens with a minor performance drop (95.07% to 94.92%) and on AIME from 6827 to 4629 tokens, with only one additional incorrect answer.

View Paper