A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette

2026-03-23

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Summary

This paper focuses on improving how well AI agents, powered by large language models, can complete complex tasks online, specifically things like navigating websites and using software.

What's the problem?

Current AI agents struggle with tasks that require many steps because they easily get confused as things change on the screen and it's hard for them to learn which actions actually lead to success. Imagine trying to follow a long set of instructions while someone keeps adding new steps – you'd likely lose track! This is especially true when the AI is learning through trial and error, because it doesn't always know *why* it succeeded or failed.

What's the solution?

The researchers developed two main improvements. First, they created a system where the AI plans ahead by breaking down big goals into smaller, manageable steps. Second, they designed a new way to train the AI using frequent 'checkpoints' or milestones, giving it more immediate feedback on whether it's on the right track. This helps the AI learn more effectively and stay focused on the overall goal.

Why it matters?

These advancements significantly boost the performance of AI agents, allowing them to handle more complex tasks and even outperform existing systems like GPT-4. This is a big step towards creating truly helpful and reliable AI assistants that can automate tasks and interact with the digital world more effectively.

Abstract

Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing LLM-based agents struggle with long-horizon planning in two main ways. During online execution, they often lose track as new information arrives, lacking a clear and adaptive path toward the final goal. This issue is further exacerbated during reinforcement learning (RL) fine-tuning, where sparse and delayed rewards make it difficult for agents to identify which actions lead to success, preventing them from maintaining coherent reasoning over extended tasks. To address these challenges, we propose two contributions. First, we introduce an agent framework that leverages proprietary models for online planning through subgoal decomposition. Second, we present MiRA (Milestoning your Reinforcement Learning Enhanced Agent), an RL training framework that uses dense, milestone-based reward signals. The real-time planning mechanism improves proprietary models such as Gemini by approximately a 10% absolute increase in success rate (SR) on the WebArena-Lite benchmark. Meanwhile, applying MiRA to the open Gemma3-12B model increases its success rate from 6.4% to 43.0%. This performance surpasses proprietary systems such as GPT-4-Turbo (17.6%) and GPT-4o (13.9%), as well as the previous open-model state of the art, WebRL (38.4%). Overall, our findings demonstrate that combining explicit inference-time planning with milestone-based rewards significantly improves an agent's long-horizon capabilities, paving the way for more robust and general-purpose autonomous systems.

View Paper