MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning

Yiqing Liang, Jielin Qiu, Wenhao Ding, Zuxin Liu, James Tompkin, Mengdi Xu, Mengzhou Xia, Zhengzhong Tu, Laixi Shi, Jiacheng Zhu

2025-06-02

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement
Learning

Summary

This paper talks about MoDoMoDo, a new system that helps AI models learn to reason better across different types of data, like text, images, and more, by using reinforcement learning and a special mix of training data.

What's the problem?

The problem is that multimodal AI models, which are supposed to understand and work with different kinds of information, often struggle to reason well when faced with tasks that mix things like pictures and words, especially after their main training is done.

What's the solution?

The researchers created a framework that keeps training these models after their initial learning, using reinforcement learning where the AI gets rewards for good reasoning. They also introduced a strategy that mixes data from many different domains, so the model gets practice with all sorts of information and situations. This approach helps the AI become more flexible and capable when tested on various benchmarks.

Why it matters?

This is important because it means AI models can become smarter and more adaptable, making them better at handling real-world problems that involve more than just one type of data, which is useful for things like education, research, and technology.

Abstract

A framework for post-training multimodal large language models using reinforcement learning with verifiable rewards introduces a data mixture strategy to enhance general reasoning abilities and benchmark performance.

View Paper