DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Murong Yue, Wenlin Yao, Haitao Mi, Dian Yu, Ziyu Yao, Dong Yu

2024-10-09

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Summary

This paper presents DOTS, a new method that helps large language models (LLMs) improve their reasoning skills by finding the best way to think through problems based on the specific question.

What's the problem?

Many LLMs have trouble reasoning effectively because they use the same fixed methods for all questions. This means they don't adapt their thinking to the unique aspects of each question, which can lead to poor answers, especially for complex tasks.

What's the solution?

The authors propose DOTS, which allows LLMs to dynamically adjust their reasoning process. It involves three main steps: creating small building blocks of reasoning actions, searching for the best combination of these actions for each question, and then training the model to use these optimized reasoning paths for new questions. This flexible approach helps LLMs allocate more thought to harder problems and improves their overall performance.

Why it matters?

This research is important because it enhances how AI models can reason and solve problems, making them more effective in real-world applications like tutoring systems, customer support, and any task that requires critical thinking. By improving LLMs' reasoning abilities, we can create smarter AI that better understands and responds to complex inquiries.

Abstract

Enhancing the capability of large language models (LLMs) in reasoning has gained significant attention in recent years. Previous studies have demonstrated the effectiveness of various prompting strategies in aiding LLMs in reasoning (called "reasoning actions"), such as step-by-step thinking, reflecting before answering, solving with programs, and their combinations. However, these approaches often applied static, predefined reasoning actions uniformly to all questions, without considering the specific characteristics of each question or the capability of the task-solving LLM. In this paper, we propose DOTS, an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM. Our approach involves three key steps: i) defining atomic reasoning action modules that can be composed into various reasoning action trajectories; ii) searching for the optimal action trajectory for each training question through iterative exploration and evaluation for the specific task-solving LLM; and iii) using the collected optimal trajectories to train an LLM to plan for the reasoning trajectories of unseen questions. In particular, we propose two learning paradigms, i.e., fine-tuning an external LLM as a planner to guide the task-solving LLM, or directly fine-tuning the task-solving LLM with an internalized capability for reasoning actions planning. Our experiments across eight reasoning tasks show that our method consistently outperforms static reasoning techniques and the vanilla instruction tuning approach. Further analysis reveals that our method enables LLMs to adjust their computation based on problem complexity, allocating deeper thinking and reasoning to harder problems.

View Paper