O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao

2025-01-23

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

Summary

This paper talks about O1-Pruner, a new method to make AI models that use long, complex reasoning processes (like OpenAI's O1) work faster without losing accuracy. It's like teaching a smart student to solve math problems more efficiently without making mistakes.

What's the problem?

AI models that use long, detailed thinking processes (called long-thought reasoning) are really good at solving complex problems, kind of like how humans think through tough questions. But this takes a lot of time, which can be a big issue when you need quick answers. It's like having a super smart friend who always gives great advice, but takes forever to respond.

What's the solution?

The researchers created O1-Pruner, which is like a special training program for these AI models. First, it figures out how well the AI is doing. Then, it uses a clever technique called reinforcement learning to teach the AI to give shorter answers without losing accuracy. It's like teaching that smart friend to give equally good advice but in fewer words. They tested this on different math problems and found that the AI could solve problems much faster and sometimes even more accurately than before.

Why it matters?

This matters because it could make powerful AI models much more useful in real-world situations where we need quick and accurate answers. Imagine having a super-smart AI assistant that can help with complex math or science problems almost instantly, instead of taking minutes to respond. This could be really helpful in fields like education, scientific research, or even in emergency situations where quick, accurate thinking is crucial. It's a big step towards making advanced AI more practical and efficient for everyday use.

Abstract

Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge. Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner

View Paper