How to Train Your LLM Web Agent: A Statistical Diagnosis

Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia

2025-07-09

How to Train Your LLM Web Agent: A Statistical Diagnosis

Summary

This paper talks about how to train large language model (LLM) web agents by combining two methods: supervised fine-tuning and on-policy reinforcement learning. This combination helps the agents perform better on complex, multi-step web tasks while using less computing power.

What's the problem?

The problem is that training LLM web agents to handle realistic multi-step tasks on websites is very expensive and difficult. Using just one training method either requires too much computing power or does not achieve good enough performance.

What's the solution?

The researchers studied how to best divide computing resources between supervised fine-tuning, where the model learns from examples given by a bigger model, and reinforcement learning, where the model learns by practicing on its own. They ran thousands of experiments to find the best way to combine these methods, showing that a two-stage approach improves accuracy and reduces training cost.

Why it matters?

This matters because it helps make advanced AI web agents more affordable and effective for real-world uses, such as booking flights or filling out forms online, bringing us closer to smarter AI assistants that can handle complex tasks on the internet.

Abstract

A study on compute allocation for post-training LLM-based web agents finds that combining supervised fine-tuning with on-policy reinforcement learning improves performance and reduces computational costs compared to using either method alone.

View Paper