Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Zhuoran Jin, Hongbang Yuan, Kejian Zhu, Jiachun Li, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
2025-10-28
Summary
This paper introduces a new approach to building reward models for AI, which are used to make sure AI systems do what humans want them to do. It focuses on making these models work well with different types of data, like text, images, videos, and audio, and also on making them better at understanding what *individual* people prefer.
What's the problem?
Current reward models have two main weaknesses. First, they mostly work well with just text and images, and struggle with other data types like video or audio. This is called 'modality imbalance'. Second, they're trained by simply asking people to pick between two options, which doesn't capture the full range of human preferences – people don't always have a clear 'better' choice, and their preferences can be complex and personal. This is called 'preference rigidity'.
What's the solution?
The researchers developed 'Omni-Reward', a system designed to overcome these problems. They created a new benchmark called 'Omni-RewardBench' to test reward models across many different data types and with more flexible preferences. They also built a large dataset, 'Omni-RewardData', with lots of examples of preferences. Finally, they designed a new model, 'Omni-RewardModel', that combines different techniques to perform well on these benchmarks and other existing tests.
Why it matters?
This work is important because it moves us closer to creating AI systems that are truly aligned with human values and can adapt to individual needs. By supporting a wider range of data types and allowing for more nuanced preferences, Omni-Reward helps build more versatile and user-friendly AI.
Abstract
Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.