HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wang
2024-12-30

Summary
This paper talks about HuatuoGPT-o1, a new large language model (LLM) designed specifically for complex reasoning in the medical field, which improves how AI can understand and solve medical problems.
What's the problem?
While many advancements have been made in AI reasoning, most research has focused on mathematical problems, leaving medical reasoning underexplored. Medical tasks require strong reasoning skills because they involve real-world applications that can affect people's health. However, verifying whether an AI's medical reasoning is correct is much harder than checking math answers.
What's the solution?
To tackle this issue, the authors propose a two-step approach using verifiable medical problems. First, they use a medical verifier to ensure that the AI's reasoning is correct. This helps guide the training of the model. Second, they apply reinforcement learning (RL) to further improve the model's reasoning abilities based on feedback from the verifier. The result is HuatuoGPT-o1, which is trained on 40,000 carefully selected medical questions and can outperform other models in solving complex medical problems.
Why it matters?
This research is important because it enhances the ability of AI to assist in healthcare by providing more reliable and accurate medical reasoning. By focusing on verifiable problems and using advanced training techniques, HuatuoGPT-o1 can potentially improve decision-making in medical settings, leading to better patient outcomes and more effective healthcare solutions.
Abstract
The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexplored. The medical domain, though distinct from mathematics, also demands robust reasoning to provide reliable answers, given the high standards of healthcare. However, verifying medical reasoning is challenging, unlike those in mathematics. To address this, we propose verifiable medical problems with a medical verifier to check the correctness of model outputs. This verifiable nature enables advancements in medical reasoning through a two-stage approach: (1) using the verifier to guide the search for a complex reasoning trajectory for fine-tuning LLMs, (2) applying reinforcement learning (RL) with verifier-based rewards to enhance complex reasoning further. Finally, we introduce HuatuoGPT-o1, a medical LLM capable of complex reasoning, which outperforms general and medical-specific baselines using only 40K verifiable problems. Experiments show complex reasoning improves medical problem-solving and benefits more from RL. We hope our approach inspires advancements in reasoning across medical and other specialized domains.