Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation
Jong Hak Moon, Geon Choi, Paloma Rabaey, Min Gwan Kim, Hyuk Gi Hong, Jung-Oh Lee, Hangyul Yoon, Eun Woo Doe, Jiyoun Kim, Harshita Sharma, Daniel C. Castro, Javier Alvarez-Valle, Edward Choi
2025-05-30
Summary
This paper talks about LUNGUAGE, a new way to test how well AI can write detailed and organized reports about chest X-rays, and also introduces LUNGUAGESCORE, a tool for measuring how good those reports are.
What's the problem?
The problem is that current AI models have trouble creating clear, structured, and accurate reports for chest X-rays, especially when doctors need to track changes in a patient's health over time.
What's the solution?
The researchers made LUNGUAGE, a special benchmark that challenges AI to generate high-quality, well-organized radiology reports, and LUNGUAGESCORE, which checks how well the AI does at this task, including how it handles information over multiple check-ups.
Why it matters?
This is important because better AI-generated reports can help doctors make more accurate decisions and keep better track of patients' health, which could lead to improved care and faster diagnoses.
Abstract
The paper introduces LUNGUAGE, a benchmark for structured radiology report generation, and LUNGUAGESCORE, an evaluation metric, enabling fine-grained structured report evaluation and longitudinal interpretation.