Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Qiaoyu Tang, Hao Xiang, Le Yu, Bowen Yu, Yaojie Lu, Xianpei Han, Le Sun, WenJuan Zhang, Pengbo Wang, Shixuan Liu, Zhenru Zhang, Jianhong Tu, Hongyu Lin, Junyang Lin

2025-10-10

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

Summary

This paper introduces DeepMiner, a new system designed to make AI agents better at complex, multi-step reasoning tasks, like those you'd encounter when searching the web to answer a difficult question.

What's the problem?

Current AI models that learn through trial and error, called reinforcement learning, struggle when they need to think through problems that require many steps and remembering information over a long conversation. They often can't handle situations where they need to build on previous reasoning to reach a final answer, especially when dealing with lots of information. Existing systems also have trouble keeping track of everything that's been said or discovered during a long interaction.

What's the solution?

The researchers created DeepMiner, which tackles this problem in two main ways. First, it generates really challenging practice questions and answers by taking information from real websites and turning it into a puzzle for the AI to solve. This forces the AI to actually *reason* to find the answer. Second, DeepMiner uses a clever way to manage the conversation history, like a sliding window, so the AI can remember what's important without needing to summarize everything or rely on other complicated tools. They applied this to a large language model called Qwen3-32B, creating DeepMiner-32B.

Why it matters?

DeepMiner significantly improves the performance of AI agents on tasks that require complex reasoning and long-term memory. It achieved a big jump in accuracy on several web search benchmarks, and importantly, it allows for much longer conversations – almost 100 turns – without losing track of the context. This is a big step towards creating AI assistants that can truly help with complicated tasks that require sustained thought and information gathering.

Abstract

While recent advances in reasoning models have demonstrated cognitive behaviors through reinforcement learning, existing approaches struggle to invoke deep reasoning capabilities in multi-turn agents with long-horizon interactions. We propose DeepMiner, a novel framework that elicits such abilities by introducing high-difficulty training tasks and dynamic context window. DeepMiner presents a reverse construction method to generate complex but verifiable question-answer pairs from authentic web sources, which ensures the challenge and reliability of training data while injecting cognitive capabilities into multi-turn reasoning scenarios. We further design an elegant yet effective dynamic context management strategy for both training and inference, utilizing sliding window mechanisms while eliminating the dependency on external summarization models, thereby efficiently empowering the model to handle continuously expanding long-horizon contexts. Through reinforcement learning on Qwen3-32B, we develop DeepMiner-32B, which achieves substantial performance improvements across multiple search agent benchmarks. DeepMiner attains 33.5% accuracy on BrowseComp-en, surpassing the previous best open-source agent by almost 20 percentage points, and demonstrates consistent improvements on BrowseComp-zh, XBench-DeepSearch, and GAIA. Notably, our dynamic context management enables sustained interactions of nearly 100 turns within standard 32k context length, effectively addressing the context limitations that constrain existing multi-turn interaction systems.

View Paper