BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Sara Pieri, Saeed Yahya Alseiari, Shanavas Cholakkal, Khaled Aldahmani, Fahad Khan, Rao Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal

2024-12-16

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Summary

This paper talks about BiMediX2, a bilingual AI model designed to analyze and understand medical images and texts in both Arabic and English, improving healthcare accessibility.

What's the problem?

Many existing AI models struggle to effectively process medical information in multiple languages, especially in less-resourced languages like Arabic. This limits their usefulness in providing accurate medical insights and assistance in diverse healthcare settings.

What's the solution?

BiMediX2 addresses this issue by integrating text and visual data, allowing it to understand medical images (like X-rays and MRIs) and respond to questions in both Arabic and English. It was trained on a large dataset of 1.6 million bilingual medical samples, which helps it perform well across various medical tasks. The model uses advanced techniques to ensure high performance in understanding and generating medical information.

Why it matters?

This research is important because it enhances the ability of AI to provide medical insights in multiple languages, making healthcare more accessible for Arabic-speaking populations. By improving the accuracy of medical translations and image analysis, BiMediX2 can support better patient care and diagnostics, paving the way for more inclusive healthcare solutions.

Abstract

This paper introduces BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM) with a unified architecture that integrates text and visual modalities, enabling advanced image understanding and medical applications. BiMediX2 leverages the Llama3.1 architecture and integrates text and visual capabilities to facilitate seamless interactions in both English and Arabic, supporting text-based inputs and multi-turn conversations involving medical images. The model is trained on an extensive bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions for both text and image modalities, mixed in Arabic and English. We also propose the first bilingual GPT-4o based medical LMM benchmark named BiMed-MBench. BiMediX2 is benchmarked on both text-based and image-based tasks, achieving state-of-the-art performance across several medical benchmarks. It outperforms recent state-of-the-art models in medical LLM evaluation benchmarks. Our model also sets a new benchmark in multimodal medical evaluations with over 9% improvement in English and over 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by around 9% in UPHILL factual accuracy evaluations and excels in various medical Visual Question Answering, Report Generation, and Report Summarization tasks. The project page including source code and the trained model, is available at https://github.com/mbzuai-oryx/BiMediX2.

View Paper