RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models
Ronald Fecso, José Morano, Ursula Schmidt-Erfurth, Hrvoje Bogunović
2025-06-30
Summary
This paper talks about RetFiner, a new method that improves existing deep learning models used for analyzing retinal images by combining them with textual data like electronic health records.
What's the problem?
Current foundation models for retinal imaging mostly learn from image data alone, which limits their understanding and performance in complex medical tasks, often requiring expensive and difficult additional fine-tuning.
What's the solution?
RetFiner uses a vision-language approach that refines these models through self-supervised learning on paired retinal images and text reports. This method improves how well the models understand the images and adapt to specific patient groups, without needing manual labeling or extensive retraining.
Why it matters?
This matters because it helps create more accurate and adaptable AI tools for diagnosing retinal diseases, which can support doctors in providing better care and can be applied to other medical fields with similar data types.
Abstract
RetFiner, a vision-language refinement scheme, enhances self-supervised foundation models for OCT by leveraging textual data, improving their downstream performance in retinal disease classification tasks.