R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Huatong Song, Jinhao Jiang, Yingqian Min, Jie Chen, Zhipeng Chen, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen

2025-03-10

R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning

Summary

This paper talks about R1-Searcher, a new way to make AI language models better at finding and using information from outside sources when answering questions

What's the problem?

Current AI models are really good at solving complex problems, but they often rely only on what they already know. This can lead to mistakes or made-up answers when the AI doesn't have enough information, especially for questions about recent events or very specific topics

What's the solution?

The researchers created R1-Searcher, which uses a technique called reinforcement learning to teach AI models how to search for information on their own. This two-step process helps the AI decide when it needs to look up extra information and how to use that information to give better answers. Unlike other methods, R1-Searcher doesn't need to be trained on specific examples of how to search; it learns on its own through trial and error

Why it matters?

This matters because it could make AI assistants much more reliable and up-to-date. By learning to search for information when needed, these AIs could give more accurate answers on a wider range of topics, including current events. This could make AI more useful in fields like education, research, and customer service, where having the most recent and accurate information is crucial

Abstract

Existing Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models~(LLMs). While they achieve remarkable performance on challenging tasks such as mathematics and coding, they often rely on their internal knowledge to solve problems, which can be inadequate for time-sensitive or knowledge-intensive questions, leading to inaccuracies and hallucinations. To address this, we propose R1-Searcher, a novel two-stage outcome-based RL approach designed to enhance the search capabilities of LLMs. This method allows LLMs to autonomously invoke external search systems to access additional knowledge during the reasoning process. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. % effectively generalizing to out-of-domain datasets and supporting both Base and Instruct models. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.

View Paper