Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang

2024-10-28

Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Summary

This paper discusses a new method for teaching multimodal large language models (MLLMs) to understand and interpret electrocardiogram (ECG) images, which are important for diagnosing heart conditions.

What's the problem?

Current methods for automatically interpreting ECGs often focus on a limited range of heart conditions and rely on raw data that may not be available in all settings. Many healthcare providers only have access to printed or digital images of ECGs, making it difficult to use existing automatic interpretation tools effectively. Additionally, there is a lack of datasets specifically designed to teach models how to interpret ECG images.

What's the solution?

To address these issues, the authors created ECGInstruct, a large dataset with over one million ECG image samples that includes a variety of tasks related to ECG interpretation. They also developed PULSE, a multimodal language model specifically trained to understand these ECG images using the new dataset. Furthermore, they created ECGBench, an evaluation benchmark for testing the model's performance on different ECG interpretation tasks. The results showed that PULSE significantly outperformed other models, achieving higher accuracy in interpreting ECGs.

Why it matters?

This research is important because it enhances the ability of AI models to interpret ECG images accurately, which can improve diagnostic processes in healthcare. By providing better tools for analyzing ECGs, this work has the potential to help doctors make more informed decisions about patient care, especially in situations where access to raw data is limited.

Abstract

The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically depend on raw physiological signals, which may not be readily available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction tuning datasets and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples, covering a wide range of ECG-related tasks from diverse data sources. Using ECGInstruct, we develop PULSE, an MLLM tailored for ECG image comprehension. In addition, we curate ECGBench, a new evaluation benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.

View Paper