Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation

Jong Hak Moon, Geon Choi, Paloma Rabaey, Min Gwan Kim, Hyuk Gi Hong, Jung-Oh Lee, Hangyul Yoon, Eun Woo Doe, Jiyoun Kim, Harshita Sharma, Daniel C. Castro, Javier Alvarez-Valle, Edward Choi

2025-05-30

Lunguage: A Benchmark for Structured and Sequential Chest X-ray
Interpretation

Summary

This paper talks about LUNGUAGE, a new way to test how well AI can write detailed and organized reports about chest X-rays, and also introduces LUNGUAGESCORE, a tool for measuring how good those reports are.

What's the problem?

The problem is that current AI models have trouble creating clear, structured, and accurate reports for chest X-rays, especially when doctors need to track changes in a patient's health over time.

What's the solution?

The researchers made LUNGUAGE, a special benchmark that challenges AI to generate high-quality, well-organized radiology reports, and LUNGUAGESCORE, which checks how well the AI does at this task, including how it handles information over multiple check-ups.

Why it matters?

This is important because better AI-generated reports can help doctors make more accurate decisions and keep better track of patients' health, which could lead to improved care and faster diagnoses.

Abstract

The paper introduces LUNGUAGE, a benchmark for structured radiology report generation, and LUNGUAGESCORE, an evaluation metric, enabling fine-grained structured report evaluation and longitudinal interpretation.

View Paper