R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang

2025-05-28

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large
Language Models via Share-GRPO

Summary

This paper talks about R1-ShareVL, a new way to train AI models that can understand both text and images, making them better at reasoning and answering more types of questions.

What's the problem?

The problem is that even though multimodal large language models can handle both words and pictures, they often struggle to reason through complex problems or answer a wide variety of questions, especially when those questions require connecting different types of information.

What's the solution?

To solve this, the researchers introduced Share-GRPO, a special training method that encourages the AI to explore more kinds of questions, learn from different ways of reasoning, and use a layered approach to figure out which answers are best. This helps the model become more flexible and accurate in its reasoning.

Why it matters?

This is important because it means AI can get much better at understanding and solving complicated problems that involve both language and images, which is useful for things like education, research, and digital assistants.

Abstract

Share-GRPO, a novel reinforcement learning approach, enhances Multimodal Large Language Models by expanding the question space, sharing diverse reasoning trajectories, and hierarchical advantage computation.

View Paper