Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, Mao Yang

2024-08-13

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Summary

This paper presents rStar, a new method that enhances the problem-solving abilities of smaller language models (SLMs) by using a technique called mutual reasoning without needing to fine-tune them or use larger models.

What's the problem?

Small language models often struggle with complex reasoning tasks because they lack the advanced capabilities of larger models. This makes it hard for them to solve challenging problems effectively, limiting their usefulness in applications that require deep understanding.

What's the solution?

The authors developed rStar, which uses a self-play mutual reasoning approach. This means one SLM generates possible solutions while another SLM checks and verifies those solutions. By working together, they can create better reasoning paths, leading to more accurate answers. The method was tested on various reasoning tasks and showed significant improvements in accuracy for several small models.

Why it matters?

This research is important because it demonstrates that smaller language models can be made much more effective at solving problems without needing extensive resources or modifications. This could make advanced AI capabilities more accessible and affordable, allowing for broader applications in education, technology, and other fields.

Abstract

This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems, including GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Remarkably, rStar boosts GSM8K accuracy from 12.51% to 63.91% for LLaMA2-7B, from 36.46% to 81.88% for Mistral-7B, from 74.53% to 91.13% for LLaMA3-8B-Instruct. Code will be available at https://github.com/zhentingqi/rStar.

View Paper