GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Yue Liu, Shengfang Zhai, Mingzhe Du, Yulin Chen, Tri Cao, Hongcheng Gao, Cheng Wang, Xinfeng Li, Kun Wang, Junfeng Fang, Jiaheng Zhang, Bryan Hooi
2025-05-19
Summary
This paper talks about GuardReasoner-VL, a new system that helps make vision-language models (VLMs) safer by teaching them to reason more carefully before giving answers, especially when dealing with images and text together.
What's the problem?
The problem is that current VLMs can sometimes give unsafe, biased, or inappropriate responses because they don't always think through their answers or consider the possible risks, which can be a big issue when these models are used in real-world applications.
What's the solution?
The researchers created GuardReasoner-VL, which is a special guard model that uses advanced training methods and a reward system that pays attention to how detailed the answers are. This model learns from a wide variety of examples and is trained to spot and avoid unsafe responses by reasoning through the information before replying.
Why it matters?
This matters because it helps prevent AI systems from making harmful mistakes, making them more trustworthy and responsible, which is especially important as these models become more common in everyday technology.
Abstract
GuardReasoner-VL enhances VLM safety through a reasoning-based guard model trained with SFT and online RL, using diverse datasets and a length-aware safety reward.