Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Hyungjoo Chae, Sunghwan Kim, Junhee Cho, Seungone Kim, Seungjun Moon, Gyeom Hwangbo, Dongha Lim, Minjin Kim, Yeonjun Hwang, Minju Gwak, Dongwook Choi, Minseok Kang, Gwanhoon Im, ByeongUng Cho, Hyojun Kim, Jun Hee Han, Taeyoon Kwon, Minju Kim, Beong-woo Kwak, Dongjin Kang, Jinyoung Yeo

2025-05-22

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Summary

This paper talks about Web-Shepherd, a new system that helps AI agents learn to navigate websites more accurately and efficiently by giving them better feedback at each step.

What's the problem?

AI agents that try to complete tasks on websites often make mistakes or take unnecessary steps because it's hard to judge their actions in detail, and current models aren't very good at providing helpful feedback for every move.

What's the solution?

The researchers created Web-Shepherd, a process reward model that evaluates each step the AI takes while navigating a website, making it easier to spot mistakes and improve learning, which leads to better performance and lower costs compared to older models.

Why it matters?

This matters because smarter web agents can help automate online tasks, assist users, and make digital services more accessible, all while being more reliable and cost-effective.

Abstract

The paper introduces Web-Shepherd, a process reward model for web navigation, which improves accuracy and cost-effectiveness in step-level trajectory assessment compared to existing multimodal large language models.

View Paper