Revealing the Barriers of Language Agents in Planning
Jian Xie, Kexun Zhang, Jiangjie Chen, Siyu Yuan, Kai Zhang, Yikai Zhang, Lei Li, Yanghua Xiao
2024-10-17

Summary
This paper explores the challenges that language agents face in planning tasks, particularly why they struggle to achieve human-level planning abilities despite advancements in artificial intelligence.
What's the problem?
Language agents, which are AI systems designed to understand and generate human-like text, have difficulty with planning tasks. Even the most advanced models, like OpenAI's o1, perform poorly on complex planning benchmarks, achieving only 15.6% success. This raises the question of what prevents these agents from planning as effectively as humans.
What's the solution?
The authors conducted a study to identify the main issues causing these planning challenges. They found that two key factors hinder effective planning: the limited role of constraints (rules that guide decision-making) and the diminishing influence of questions (which help focus the planning process). Although some strategies exist to improve performance, they do not completely solve these underlying problems.
Why it matters?
This research is important because it sheds light on the limitations of current language agents in performing complex tasks. Understanding these barriers can guide future research and development, leading to more capable AI systems that can plan and reason at a level closer to human intelligence.
Abstract
Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence.