IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Guoxin Chen, Zile Qiao, Xuanzhong Chen, Donglei Yu, Haotian Xu, Wayne Xin Zhao, Ruihua Song, Wenbiao Yin, Huifeng Yin, Liwen Zhang, Kuan Li, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

2025-11-11

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Summary

This paper introduces a new way for AI agents to do complex research tasks, specifically focusing on how they gather and use information over a long period of time.

What's the problem?

Current AI research agents struggle with long tasks because they try to remember *everything* they find in one big chunk of information. This leads to the system getting overwhelmed with too much data, making it hard to focus on what's important and ultimately hurting its performance. It's like trying to write a research paper while constantly adding notes without ever organizing them – things get messy and you lose track of your main argument.

What's the solution?

The researchers developed a system called IterResearch that works more like a human researcher. Instead of keeping one massive record, it works in cycles: it explores, writes a summary report, then uses that report as its memory for the next round of exploration. This 'report' acts as a constantly updated understanding of the topic. They also created a way to train the AI to be efficient in its searching, rewarding it for finding useful information quickly. This training process uses a special technique to make sure the AI learns effectively even when dealing with a lot of information.

Why it matters?

This work is important because it significantly improves the ability of AI agents to tackle complex, long-term research problems. It not only makes the AI better at doing the research itself, but also provides a new technique – the iterative report-writing process – that can be used to improve even the most advanced AI models, making them more capable and reliable for tasks requiring deep reasoning and knowledge gathering.

Abstract

Recent advances in deep-research agents have shown promise for autonomous knowledge construction through dynamic reasoning over external sources. However, existing approaches rely on a mono-contextual paradigm that accumulates all information in a single, expanding context window, leading to context suffocation and noise contamination that limit their effectiveness on long-horizon tasks. We introduce IterResearch, a novel iterative deep-research paradigm that reformulates long-horizon research as a Markov Decision Process with strategic workspace reconstruction. By maintaining an evolving report as memory and periodically synthesizing insights, our approach preserves consistent reasoning capacity across arbitrary exploration depths. We further develop Efficiency-Aware Policy Optimization (EAPO), a reinforcement learning framework that incentivizes efficient exploration through geometric reward discounting and enables stable distributed training via adaptive downsampling. Extensive experiments demonstrate that IterResearch achieves substantial improvements over existing open-source agents with average +14.5pp across six benchmarks and narrows the gap with frontier proprietary systems. Remarkably, our paradigm exhibits unprecedented interaction scaling, extending to 2048 interactions with dramatic performance gains (from 3.5\% to 42.5\%), and serves as an effective prompting strategy, improving frontier models by up to 19.2pp over ReAct on long-horizon tasks. These findings position IterResearch as a versatile solution for long-horizon reasoning, effective both as a trained agent and as a prompting paradigm for frontier models.

View Paper