SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging

Tan-Hanh Pham, Chris Ngo, Trong-Duong Bui, Minh Luu Quang, Tan-Huong Pham, Truong-Son Hy

2025-04-22

SilVar-Med: A Speech-Driven Visual Language Model for Explainable
Abnormality Detection in Medical Imaging

Summary

This paper talks about SilVar-Med, a new AI system that lets doctors use their voice to interact with medical images and helps explain what might be wrong in those images.

What's the problem?

The problem is that medical imaging, like X-rays or MRIs, can be complicated to analyze, and most AI tools just give answers without explaining their reasoning or allowing doctors to easily interact with them, which can make it hard for doctors to trust or understand the results.

What's the solution?

The researchers created SilVar-Med, which combines speech recognition with a visual language model. This means doctors can talk to the system, ask questions, and get clear explanations about what the AI sees in the images and why it thinks something might be abnormal.

Why it matters?

This matters because it makes medical AI tools more user-friendly and trustworthy, helping doctors make better decisions and communicate more clearly with patients about what’s happening in their medical images.

Abstract

SilVar-Med is an end-to-end speech-driven medical visual language model that enhances interpretability and interaction in medical image analysis through voice commands and reasoning-driven predictions.

View Paper