KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Linhao Yu, Tianmeng Yang, Siyu Ding, Renren Jin, Naibin Gu, Xiangzhao Hao, Shuaiyi Nie, Deyi Xiong, Weichong Yin, Yu Sun, Hua Wu

2026-04-15

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Summary

This paper introduces a new method called KnowRL to improve how large language models solve complex reasoning problems. It focuses on making the learning process more efficient by providing targeted help, rather than overwhelming the model with too much information.

What's the problem?

Large language models are getting better at reasoning, but they often struggle with difficult problems because the feedback they receive during learning is very sparse – meaning they don't get much guidance on how to improve. Existing methods try to help by giving hints, but these hints often add unnecessary information, create inconsistencies, or require a lot of extra training time.

What's the solution?

KnowRL tackles this by breaking down hints into small, essential pieces of knowledge. It then carefully selects only the most helpful pieces for each specific situation during training. The system also addresses a tricky issue where removing one hint can be good, but removing several hints together can actually make things worse, and it optimizes the hint selection process to avoid this. They applied KnowRL to a 1.5 billion parameter model called Nemotron-1.5B.

Why it matters?

This research is important because it shows a way to significantly improve the reasoning abilities of large language models without needing massive amounts of extra data or training. The KnowRL-enhanced model outperforms previous methods on several reasoning tasks, achieving a new state-of-the-art performance at its size, and the code and data are available for others to use and build upon.

Abstract

RLVR improves reasoning in large language models, but its effectiveness is often limited by severe reward sparsity on hard problems. Recent hint-based RL methods mitigate sparsity by injecting partial solutions or abstract templates, yet they typically scale guidance by adding more tokens, which introduce redundancy, inconsistency, and extra training overhead. We propose KnowRL (Knowledge-Guided Reinforcement Learning), an RL training framework that treats hint design as a minimal-sufficient guidance problem. During RL training, KnowRL decomposes guidance into atomic knowledge points (KPs) and uses Constrained Subset Search (CSS) to construct compact, interaction-aware subsets for training. We further identify a pruning interaction paradox -- removing one KP may help while removing multiple such KPs can hurt -- and explicitly optimize for robust subset curation under this dependency structure. We train KnowRL-Nemotron-1.5B from OpenMath-Nemotron-1.5B. Across eight reasoning benchmarks at the 1.5B scale, KnowRL-Nemotron-1.5B consistently outperforms strong RL and hinting baselines. Without KP hints at inference, KnowRL-Nemotron-1.5B reaches 70.08 average accuracy, already surpassing Nemotron-1.5B by +9.63 points; with selected KPs, performance improves to 74.16, establishing a new state of the art at this scale. The model, curated training data, and code are publicly available at https://github.com/Hasuer/KnowRL.

View Paper