Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

Baichuan-M3 Team, Chengfeng Dou, Fan Yang, Fei Li, Jiyuan Jia, Qiang Ju, Shuai Wang, Tianpeng Li, Xiangrong Zeng, Yijie Zhou, Hongda Zhang, Jinyang Tai, Linzhuang Sun, Peidong Guo, Yichuan Mo, Xiaochuan Wang, Hengfu Cui, Zhishou Zhang

2026-02-09

Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

Summary

This paper introduces Baichuan-M3, a new large language model specifically designed for medical applications, aiming to go beyond simply answering questions and instead provide helpful advice like a doctor would.

What's the problem?

Current AI systems struggle with complex medical consultations because they often just respond to questions without actively seeking missing information or connecting all the pieces of a patient's case to reach a diagnosis. They can also sometimes make up information, which is obviously dangerous in a medical setting.

What's the solution?

The creators of Baichuan-M3 trained it using a method that mimics how doctors think. It's designed to ask clarifying questions when things are unclear, consider all available evidence over a long period of time to form a complete picture, and actively avoid providing incorrect or made-up information. They tested it on several medical benchmarks and it performed better than other models, including a more advanced version of GPT.

Why it matters?

This is important because it represents a step towards AI that can truly assist doctors and improve patient care. By being proactive, reasoning thoroughly, and prioritizing accuracy, Baichuan-M3 has the potential to be a valuable tool in the medical field, helping with diagnosis and treatment decisions.

Abstract

We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipeline to model the systematic workflow of a physician. Key capabilities include: (i) proactive information acquisition to resolve ambiguity; (ii) long-horizon reasoning that unifies scattered evidence into coherent diagnoses; and (iii) adaptive hallucination suppression to ensure factual reliability. Empirical evaluations demonstrate that Baichuan-M3 achieves state-of-the-art results on HealthBench, the newly introduced HealthBench-Hallu and ScanBench, significantly outperforming GPT-5.2 in clinical inquiry, advisory and safety. The models are publicly available at https://huggingface.co/collections/baichuan-inc/baichuan-m3.

View Paper