Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior

Sheng Wang, Ruiming Wu, Charles Herndon, Yihang Liu, Shunsuke Koga, Jeanne Shen, Zhi Huang

2025-10-14

Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior

Summary

This paper introduces a new way to train AI to help pathologists analyze whole-slide images, which are basically very detailed digital scans of tissue samples. The goal is to create an AI system that can not only identify potential problems but also explain *how* it arrived at its conclusion, mimicking the thought process of a human pathologist.

What's the problem?

Current AI models are good at recognizing patterns in these images, but they lack the ability to actively 'explore' the slide like a pathologist does. Pathologists don't just look at the whole slide at once; they zoom in and out, move around to different areas, and focus on specific regions. The biggest challenge is that this 'expert viewing behavior' isn't written down anywhere – it's learned through years of experience and is hard to translate into data that AI can learn from.

What's the solution?

The researchers developed a tool called the AI Session Recorder that quietly tracks how pathologists navigate through slides using standard viewing software. This creates a record of where they look and at what magnification. They then used this data, along with a bit of human review, to create a dataset called Pathology-CoT. This dataset essentially tells the AI *where* to look and *why* that area is important. Using this data, they built an AI system called Pathologist-o3 that first identifies areas of interest and then reasons about them, guided by the recorded pathologist behavior. It performed better than existing AI models in detecting cancer spread in lymph nodes.

Why it matters?

This work is important because it's one of the first attempts to build an AI system for pathology that learns from *how* pathologists actually work, not just *what* they see. By turning everyday viewing logs into useful training data, it makes it more practical to create AI tools that can assist pathologists, improve accuracy, and ultimately lead to better patient care. It also sets the stage for AI systems that can continuously learn and improve as they are used in clinical settings.

Abstract

Diagnosing a whole-slide image is an interactive, multi-stage process involving changes in magnification and movement between fields. Although recent pathology foundation models are strong, practical agentic systems that decide what field to examine next, adjust magnification, and deliver explainable diagnoses are still lacking. The blocker is data: scalable, clinically aligned supervision of expert viewing behavior that is tacit and experience-based, not written in textbooks or online, and therefore absent from large language model training. We introduce the AI Session Recorder, which works with standard WSI viewers to unobtrusively record routine navigation and convert the viewer logs into standardized behavioral commands (inspect or peek at discrete magnifications) and bounding boxes. A lightweight human-in-the-loop review turns AI-drafted rationales into the Pathology-CoT dataset, a form of paired "where to look" and "why it matters" supervision produced at roughly six times lower labeling time. Using this behavioral data, we build Pathologist-o3, a two-stage agent that first proposes regions of interest and then performs behavior-guided reasoning. On gastrointestinal lymph-node metastasis detection, it achieved 84.5% precision, 100.0% recall, and 75.4% accuracy, exceeding the state-of-the-art OpenAI o3 model and generalizing across backbones. To our knowledge, this constitutes one of the first behavior-grounded agentic systems in pathology. Turning everyday viewer logs into scalable, expert-validated supervision, our framework makes agentic pathology practical and establishes a path to human-aligned, upgradeable clinical AI.

View Paper