R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, Haojie Ding, Jiankang Chen, Fan Yang, Zhang Zhang, Tingting Gao, Liang Wang

2025-05-06

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement
Learning

Summary

This paper talks about R1-Reward, a new method for training AI that can judge and learn from both pictures and text at the same time, making the learning process more stable and effective.

What's the problem?

AI models that try to understand and learn from multiple types of information, like images and words together, often have trouble staying stable during training, which can hurt their performance.

What's the solution?

The researchers used reinforcement learning, a way for AI to learn by getting feedback on its actions, to help these multimodal reward models train more smoothly and get better results.

Why it matters?

This matters because it helps create smarter AI that can handle complex tasks involving both language and visuals, making it more useful for things like virtual assistants, content moderation, and creative tools.

Abstract

Using Reinforcement Learning to improve reward modeling in Multimodal Reward Models leads to more stable training and superior performance.

View Paper