NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu, Jinjie Ni, Zijian Wu, Chao Du, Longxu Dou, Haonan Wang, Tianyu Pang, Michael Qizhe Shieh
2025-04-18
Summary
This paper talks about NoisyRollout, a new way to train AI models that understand images and text together, by mixing in slightly distorted images during training to help the model learn to reason better and handle all kinds of visual situations.
What's the problem?
The problem is that vision-language models, which are supposed to answer questions about images or explain what’s happening in them, often struggle when the images aren’t perfect or look different from what they’ve seen before. This makes the models less reliable, especially when they face new or challenging visual tasks.
What's the solution?
The researchers created NoisyRollout, a reinforcement learning method that trains the model using both clean and slightly noisy images at the same time. By seeing different versions of the same image, the model learns to be more flexible and robust in its reasoning. The process doesn’t require extra training time or changes to how the model works, and it gradually reduces the amount of noise as training goes on to keep everything stable.
Why it matters?
This matters because it helps AI become better at understanding and reasoning about images, even when the visuals aren’t perfect. That means these models can be trusted more in real-world situations, like helping with medical images, robotics, or any job where the AI needs to make sense of what it sees.
Abstract
NoisyRollout, an RL approach that introduces targeted diversity through noise in image trajectories, enhances VLM policy exploration without additional training cost, achieving state-of-the-art performance on out-of-domain benchmarks.