HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

Tianwei Lin, Wenqiao Zhang, Sijing Li, Yuqian Yuan, Binhe Yu, Haoyuan Li, Wanggui He, Hao Jiang, Mengze Li, Xiaohui Song, Siliang Tang, Jun Xiao, Hui Lin, Yueting Zhuang, Beng Chin Ooi

2025-02-19

HealthGPT: A Medical Large Vision-Language Model for Unifying
Comprehension and Generation via Heterogeneous Knowledge Adaptation

Summary

This paper talks about HealthGPT, a new AI model designed for medical applications that can both understand and create medical images. It's like a super-smart doctor's assistant that can look at X-rays or scans and explain what it sees, as well as create new medical images based on descriptions.

What's the problem?

Current AI models in healthcare are usually good at either understanding medical images or creating them, but not both. This limits how useful they can be in real medical situations where doctors might need to both analyze existing images and visualize potential treatments or outcomes.

What's the solution?

The researchers created HealthGPT, which uses a clever technique called H-LoRA to teach a large language model (like the ones used in chatbots) how to work with medical images. They also made a special dataset called VL-Health to train the AI on lots of medical information. HealthGPT uses a three-step learning process and a special way of looking at images to become good at both understanding and creating medical visuals.

Why it matters?

This matters because it could make AI much more useful in healthcare. Doctors could use HealthGPT to get second opinions on diagnoses, explain complex medical images to patients, or even visualize how a treatment might work. It could speed up medical processes, improve accuracy, and help doctors and patients better understand complex medical situations. This kind of AI could potentially make advanced medical imaging expertise more accessible, especially in areas where specialist doctors are scarce.

Abstract

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

View Paper