RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
Yeongtak Oh, Jisoo Mok, Dohyun Chung, Juhyeon Shin, Sangha Park, Johan Barthelemy, Sungroh Yoon
2025-06-24
Summary
This paper talks about RePIC, a method that improves how multi-modal large language models generate personalized image captions by using reinforcement learning after the main training is done.
What's the problem?
The problem is that current methods for fine-tuning models to personalize image captions often don’t adapt well to individual preferences or unique styles, limiting how accurately and personally the models describe images.
What's the solution?
The researchers developed a reinforcement learning-based post-training approach that teaches the models to better adjust and personalize image captions by learning from feedback, rather than just following fixed examples during fine-tuning.
Why it matters?
This matters because it helps AI create more personalized and accurate descriptions of images, making applications like virtual assistants and accessibility tools more useful and user-friendly.
Abstract
A reinforcement learning-based post-training framework improves the personalized image captioning capabilities of multi-modal large language models compared to supervised fine-tuning methods.