CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

Hyungyung Lee, Geon Choi, Jung-Oh Lee, Hangyul Yoon, Hyuk Gi Hong, Edward Choi

2025-05-30

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic
Reasoning in Chest X-rays

Summary

This paper talks about CXReasonBench, a new way to test how well AI models can look at chest X-rays and reason through medical problems, checking if they can make accurate and logical diagnoses like a real doctor.

What's the problem?

The problem is that while AI models are getting better at reading medical images, it's hard to know if they truly understand what they're seeing or if they can explain their thinking in a structured and reliable way, which is important for trust and safety in healthcare.

What's the solution?

The researchers created special tools, CheXStruct and CXReasonBench, to carefully measure how well these AI models can connect what they see in chest X-rays to medical knowledge, explain their reasoning, and work with different types of cases using a large dataset of real X-rays.

Why it matters?

This is important because it helps make sure that AI used in hospitals is not only accurate but also trustworthy and understandable, which can lead to better patient care and more confidence in using AI for important medical decisions.

Abstract

CheXStruct and CXReasonBench evaluate Large Vision-Language Models in clinical diagnosis by assessing structured reasoning, visual grounding, and generalization using the MIMIC-CXR-JPG dataset.

View Paper