Generating Symbolic World Models via Test-time Scaling of Large Language Models
Zhouliang Yu, Yuhuan Yuan, Tim Z. Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge Lin, Weiyang Liu
2025-02-10
Summary
This paper talks about a new way to make AI systems better at planning complex tasks by using a special language called PDDL (Planning Domain Definition Language) and improving how large language models (LLMs) work with it.
What's the problem?
Current AI models struggle with complex planning because they can't always understand the rules and constraints clearly when using natural language. They also have trouble creating PDDL descriptions, which are more precise ways to describe planning problems, because there isn't much PDDL data to learn from.
What's the solution?
The researchers developed a method to make LLMs better at creating PDDL descriptions without needing more training data. They use a technique called 'test-time scaling' which involves trying multiple solutions and refining the best one. This helps the AI create more accurate PDDL models of planning problems.
Why it matters?
This matters because it allows AI systems to tackle more complex planning tasks more effectively. By using PDDL, the AI can work with classic planning algorithms to find optimal solutions. This could lead to better AI performance in areas like robotics, logistics, and any field that requires careful planning and problem-solving.
Abstract
Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindered by the inherent ambiguity of natural language. To overcome such ambiguity, Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions. With PDDL, we can generate a symbolic world model where classic searching algorithms, such as A*, can be seamlessly applied to find optimal plans. However, directly generating PDDL domains with current LLMs remains an open challenge due to the lack of PDDL training data. To address this challenge, we propose to scale up the test-time computation of LLMs to enhance their PDDL reasoning capabilities, thereby enabling the generation of high-quality PDDL domains. Specifically, we introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning. Our method outperforms o1-mini by a considerable margin in the generation of PDDL domain, achieving over 50% success rate on two tasks (i.e., generating PDDL domains from natural language description or PDDL problems). This is done without requiring additional training. By taking advantage of PDDL as state abstraction, our method is able to outperform current state-of-the-art methods on almost all competition-level planning tasks.