QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Zongyu Lin, Yao Tang, Xingcheng Yao, Da Yin, Ziniu Hu, Yizhou Sun, Kai-Wei Chang

2025-02-05

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Summary

This paper introduces QLASS, a method that helps language AI agents solve complex problems by breaking them into smaller steps and guiding each step with a scoring system called Q-values. This approach improves how AI agents make decisions and complete tasks more efficiently.

What's the problem?

Language AI agents often struggle with solving complex tasks because they rely on a single reward at the end of the process, which doesn't provide enough feedback for improving their decision-making at each step. This leads to less effective solutions and wasted effort.

What's the solution?

QLASS solves this by using a step-by-step guidance system that assigns scores, called Q-values, to different possible actions during each step of the task. These scores help the AI choose the best path forward. The researchers trained QLASS using a reasoning tree and reinforcement learning, enabling it to improve its performance even with less training data.

Why it matters?

QLASS is important because it allows AI agents to solve problems more accurately and efficiently by focusing on long-term success instead of just immediate results. This advancement could lead to smarter AI systems capable of handling more complex real-world tasks.

Abstract

Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training or inference. However, due to the lack of annotations of intermediate interactions, most existing works use an outcome <PRE_TAG>reward model</POST_TAG> to optimize policies across entire trajectories. This may lead to sub-optimal policies and hinder the overall performance. To address this, we propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values in a stepwise manner for open language agents. By introducing a reasoning tree and performing process <PRE_TAG>reward modeling</POST_TAG>, QLASS provides effective intermediate guidance for each step. With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value, resulting in significant performance improvement during model inference on complex interactive agent tasks. Notably, even with almost half the annotated data, QLASS retains strong performance, demonstrating its efficiency in handling limited supervision. We also empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis. We will release our code and data.

View Paper