ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Ling Yang, Zhaochen Yu, Bin Cui, Mengdi Wang
2025-02-11
Summary
This paper talks about ReasonFlux, a new way to make AI models better at solving complex math problems by using a system of thought templates and hierarchical reasoning.
What's the problem?
Current AI models, even powerful ones, struggle with complex mathematical reasoning tasks. They often try to solve problems in one go, which isn't always effective for tricky math questions.
What's the solution?
The researchers created ReasonFlux, which uses a library of 500 thought templates to break down complex problems into smaller, manageable steps. It uses a method called hierarchical reinforcement learning to figure out the best sequence of templates for solving a problem. ReasonFlux also has a system that can adjust how it uses these templates based on how difficult the problem is.
Why it matters?
This matters because ReasonFlux makes AI much better at solving hard math problems. It outperformed some of the best existing AI models on tough math tests, including olympiad-level problems. This could lead to AI systems that are much more helpful for students, researchers, and anyone dealing with complex mathematical reasoning tasks.
Abstract
We present that hierarchical LLM reasoning via scaling thought templates can effectively optimize the reasoning search space and outperform the mathematical reasoning capabilities of powerful LLMs like OpenAI o1-preview and DeepSeek V3. We train our ReasonFlux-32B model with only 8 GPUs and introduces three innovations: (i) a structured and generic thought template library, containing around 500 high-level thought templates capable of generalizing to similar or relevant reasoning problems; (ii) performing hierarchical reinforcement learning on a sequence of thought templates instead of long CoTs, optimizing a base LLM to plan out an optimal <PRE_TAG>template trajectory</POST_TAG> for gradually handling complex problems; (iii) a brand new inference scaling system that enables hierarchical LLM reasoning by adaptively scaling thought templates at inference time. With a <PRE_TAG>template trajectory</POST_TAG> containing sequential thought templates, our ReasonFlux-32B significantly advances math reasoning capabilities to state-of-the-art levels. Notably, on the MATH benchmark, it achieves an accuracy of 91.2% and surpasses o1-preview by 6.7%. On the USA Math Olympiad (AIME) benchmark, ReasonFlux-32B solves an average of 56.7% of problems, surpassing o1-preview and DeepSeek-V3 by 27% and 45%, respectively. Code: https://github.com/Gen-Verse/ReasonFlux