WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Xinyue Yang, Jiadai Sun, Yu Yang, Shuntian Yao, Tianjie Zhang, Wei Xu, Jie Tang, Yuxiao Dong

2024-11-05

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Summary

This paper introduces WebRL, a new framework designed to train large language model (LLM) web agents using a self-evolving online curriculum. It focuses on improving the performance of these agents in web-based tasks without relying on expensive proprietary models.

What's the problem?

Many current LLM web agents depend on costly APIs from proprietary models, which limits accessibility and flexibility. Open-source LLMs often lack the decision-making skills needed to perform well in various tasks. Additionally, training these agents can be challenging due to a lack of diverse training tasks, inconsistent feedback, and difficulties in maintaining effective learning strategies over time.

What's the solution?

WebRL addresses these challenges by creating a self-evolving curriculum that generates new training tasks based on previous failures. It uses a robust reward model to provide feedback on performance and adaptive learning strategies that help the agents improve consistently. The authors applied WebRL to train the Llama-3.1 and GLM-4 models, significantly increasing their success rates in completing tasks on the WebArena-Lite benchmark, outperforming even some of the best proprietary models.

Why it matters?

This research is important because it makes powerful web agents more accessible by using open-source models instead of expensive proprietary ones. By improving how these agents learn and adapt to tasks, WebRL can enhance their effectiveness in real-world applications like customer service, online assistance, and more, paving the way for better autonomous systems that can interact with users online.

Abstract

Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements. We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents. On WebArena-Lite, WebRL improves the success rate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B. These open models significantly surpass the performance of GPT-4-Turbo (17.6%) and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2%). Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems.

View Paper