TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning

Yinuo Wang, Mining Tan, Wenxiang Jiao, Xiaoxi Li, Hao Wang, Xuanyu Zhang, Yuan Lu, Weiming Dong

2026-01-13

TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning

Summary

This paper introduces TourPlanner, a new system designed to automatically create travel plans, like figuring out what to do and where to go on a trip.

What's the problem?

Planning a trip is hard for computers because they need to consider a lot of different things. First, it's tough to narrow down all the possible things to do without missing out on good options. Second, most systems only explore one possible plan at a time, which limits finding the *best* plan. Finally, it’s difficult to balance things you *must* do (like staying within a budget) with things you *want* to do (like visiting specific types of restaurants).

What's the solution?

TourPlanner tackles these problems in a few key ways. It starts by smartly suggesting places to visit, making sure to consider where things are located. Then, instead of just looking at one plan, it explores many different possibilities simultaneously using a technique called 'multi-path reasoning'. Finally, it uses a system that focuses on meeting essential requirements first, like budget or time constraints, before trying to satisfy preferences like food choices.

Why it matters?

This research is important because it creates significantly better travel planning systems. TourPlanner outperforms existing methods, meaning it creates more realistic and satisfying travel itineraries that better match what a user would actually enjoy, while also making sure the plans are actually possible to execute.

Abstract

Travel planning is a sophisticated decision-making process that requires synthesizing multifaceted information to construct itineraries. However, existing travel planning approaches face several challenges: (1) Pruning candidate points of interest (POIs) while maintaining a high recall rate; (2) A single reasoning path restricts the exploration capability within the feasible solution space for travel planning; (3) Simultaneously optimizing hard constraints and soft constraints remains a significant difficulty. To address these challenges, we propose TourPlanner, a comprehensive framework featuring multi-path reasoning and constraint-gated reinforcement learning. Specifically, we first introduce a Personalized Recall and Spatial Optimization (PReSO) workflow to construct spatially-aware candidate POIs' set. Subsequently, we propose Competitive consensus Chain-of-Thought (CCoT), a multi-path reasoning paradigm that improves the ability of exploring the feasible solution space. To further refine the plan, we integrate a sigmoid-based gating mechanism into the reinforcement learning stage, which dynamically prioritizes soft-constraint satisfaction only after hard constraints are met. Experimental results on travel planning benchmarks demonstrate that TourPlanner achieves state-of-the-art performance, significantly surpassing existing methods in both feasibility and user-preference alignment.

View Paper