RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models

Ronald Fecso, José Morano, Ursula Schmidt-Erfurth, Hrvoje Bogunović

2025-06-30

RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation
Models

Summary

This paper talks about RetFiner, a new method that improves existing deep learning models used for analyzing retinal images by combining them with textual data like electronic health records.

What's the problem?

Current foundation models for retinal imaging mostly learn from image data alone, which limits their understanding and performance in complex medical tasks, often requiring expensive and difficult additional fine-tuning.

What's the solution?

RetFiner uses a vision-language approach that refines these models through self-supervised learning on paired retinal images and text reports. This method improves how well the models understand the images and adapt to specific patient groups, without needing manual labeling or extensive retraining.

Why it matters?

This matters because it helps create more accurate and adaptable AI tools for diagnosing retinal diseases, which can support doctors in providing better care and can be applied to other medical fields with similar data types.

Abstract

RetFiner, a vision-language refinement scheme, enhances self-supervised foundation models for OCT by leveraging textual data, improving their downstream performance in retinal disease classification tasks.

View Paper