Learning to Retrieve from Agent Trajectories

Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen

2026-04-08

Learning to Retrieve from Agent Trajectories

Summary

This paper explores how to improve search results when those results aren't being used by people directly, but by AI agents that are trying to accomplish tasks. It argues that traditional search methods, built around how humans search, don't work as well for these AI agents.

What's the problem?

Current search systems are trained using data from how people click on and use search results. However, AI agents search and use information in a very different way than humans do. They often go through multiple steps, looking at many documents, and their 'reasoning' process isn't reflected in simple clicks or time spent on a page. This means that search systems optimized for humans aren't very effective when used by AI agents, leading to less accurate results and slower task completion.

What's the solution?

The researchers propose a new way to train search systems specifically for AI agents. Instead of relying on human data, they analyze the complete 'path' an agent takes while searching – what documents it looks at, which ones it ignores, and how it uses the information it finds. They developed a framework called LRAT that uses these agent 'trajectories' to learn what makes a document useful to an agent, and then uses that information to improve search results. It also considers *how much* an agent uses a document, not just *that* it uses it.

Why it matters?

This work is important because AI agents are becoming increasingly common in many applications, from answering questions to conducting research. If these agents can't find the information they need efficiently, it limits their capabilities. By training search systems directly from how agents behave, this research promises to make AI agents more effective and reliable, paving the way for more powerful AI applications.

Abstract

Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasingly consumed by agents rather than human beings, and is embedded as a core component within multi-turn reasoning and action loops. In this setting, retrieval models trained under human-centric assumptions exhibit a fundamental mismatch with the way agents issue queries and consume results. In this work, we argue that retrieval models for agentic search should be trained directly from agent interaction data. We introduce learning to retrieve from agent trajectories as a new training paradigm, where supervision is derived from multi-step agent interactions. Through a systematic analysis of search agent trajectories, we identify key behavioral signals that reveal document utility, including browsing actions, unbrowsed rejections, and post-browse reasoning traces. Guided by these insights, we propose LRAT, a simple yet effective framework that mines high-quality retrieval supervision from agent trajectories and incorporates relevance intensity through weighted optimization. Extensive experiments on both in-domain and out-of-domain deep research benchmarks demonstrate that retrievers trained with LRAT consistently improve evidence recall, end-to-end task success, and execution efficiency across diverse agent architectures and scales. Our results highlight agent trajectories as a practical and scalable supervision source, pointing to a promising direction for retrieval in the era of agentic search.

View Paper