Agent Workflow Memory

Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig

2024-09-12

Summary

This paper talks about Agent Workflow Memory (AWM), a new method designed to help language model-based agents perform complex tasks more effectively by learning and reusing workflows from past experiences.

What's the problem?

Current language model agents struggle with long and complicated tasks that require multiple steps. Unlike humans, who can learn from previous experiences and apply that knowledge to new situations, these agents often fail to navigate complex action sequences efficiently.

What's the solution?

To improve this, the authors developed AWM, which allows agents to identify and reuse common routines or workflows. AWM can learn workflows either from training examples before a task or adaptively from the task at hand. The researchers tested AWM on two major web navigation benchmarks, showing that it significantly improved the success rate of the agents while also reducing the steps needed to complete tasks.

Why it matters?

This research is important because it enhances the ability of AI agents to perform real-world tasks more effectively. By enabling agents to learn from past experiences and apply that knowledge, AWM could lead to smarter and more efficient AI systems in areas like web navigation, online shopping, and other interactive applications.

Abstract

Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.

View Paper