LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Jingyuan Wang, Yankai Chen, Zhonghang Li, Chao Huang

2025-10-13

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Summary

This paper introduces a new method called LightReasoner that helps large language models (LLMs) get better at reasoning tasks, like solving math problems, without needing huge amounts of training data or computing power.

What's the problem?

Currently, improving LLMs requires a lot of resources. Training them to reason well usually involves feeding them massive datasets created by humans, and even then, the training process isn't very efficient because it treats all parts of the data as equally important when really, only certain parts actually help the model learn. It's like studying every single word in a textbook instead of focusing on the key concepts.

What's the solution?

LightReasoner uses a 'teaching' approach. It takes a smaller, less powerful language model and lets it try to solve problems alongside a larger, more capable one. By comparing where the larger model excels and the smaller model struggles, the system identifies the most important 'reasoning moments'. These moments are then used to create focused training examples for the larger model, essentially highlighting the areas where it can improve. It's like a tutor pointing out exactly what a student needs to work on.

Why it matters?

This research is important because it offers a way to make LLMs smarter without requiring massive datasets or expensive training runs. By using smaller models to guide the learning process, it makes improving LLM reasoning more scalable and accessible, potentially leading to faster advancements in artificial intelligence.

Abstract

Large language models (LLMs) have demonstrated remarkable progress in reasoning, often through supervised fine-tuning (SFT). However, SFT is resource-intensive, relying on large curated datasets, rejection-sampled demonstrations, and uniform optimization across all tokens, even though only a fraction carry meaningful learning value. In this work, we explore a counterintuitive idea: can smaller language models (SLMs) teach larger language models (LLMs) by revealing high-value reasoning moments that reflect the latter's unique strength? We propose LightReasoner, a novel framework that leverages the behavioral divergence between a stronger expert model (LLM) and a weaker amateur model (SLM). LightReasoner operates in two stages: (1) a sampling stage that pinpoints critical reasoning moments and constructs supervision examples capturing the expert's advantage through expert-amateur contrast, and (2) a fine-tuning stage that aligns the expert model with these distilled examples, amplifying its reasoning strengths. Across seven mathematical benchmarks, LightReasoner improves accuracy by up to 28.1%, while reducing time consumption by 90%, sampled problems by 80%, and tuned token usage by 99%, all without relying on ground-truth labels. By turning weaker SLMs into effective teaching signals, LightReasoner offers a scalable and resource-efficient approach for advancing LLM reasoning. Code is available at: https://github.com/HKUDS/LightReasoner

View Paper