OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

Yuan Gong, Xionghui Wang, Jie Wu, Shiyin Wang, Yitong Wang, Xinglong Wu

2025-08-29

OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

Summary

This paper introduces a new way to train AI models to be good at many different image editing tasks, like filling in missing parts of pictures or removing objects, all using a single 'reward' system.

What's the problem?

Traditionally, when you want an AI to do different image editing tasks, you need to train it separately for each one. This is because each task uses different kinds of data and has different ways of measuring success. This separate training is time-consuming and doesn't allow the AI to learn general skills that could help it with new tasks. It's like teaching someone to drive a car, then a truck, then a motorcycle – each requires completely separate lessons instead of building on the core skill of driving.

What's the solution?

The researchers created a system called OneReward that uses one AI model to judge how well the image editing is going, no matter the specific task. This 'judge' model looks at the image and decides which version is better based on what the task requires. They then used this OneReward system to build a model called Seedream 3.0 Fill, which can perform several image editing tasks without needing separate training for each one. It learns everything at once, directly from a pre-existing base model.

Why it matters?

This is important because it makes training AI for image editing much more efficient and allows the AI to perform better across a wider range of tasks. Instead of needing specialized training for each edit, the AI can learn a general understanding of what makes a good edit, leading to more versatile and powerful image editing tools that can even outperform existing professional software.

Abstract

In this paper, we introduce OneReward, a unified reinforcement learning framework that enhances the model's generative capabilities across multiple tasks under different evaluation criteria using only One Reward model. By employing a single vision-language model (VLM) as the generative reward model, which can distinguish the winner and loser for a given task and a given evaluation criterion, it can be effectively applied to multi-task generation models, particularly in contexts with varied data and diverse task objectives. We utilize OneReward for mask-guided image generation, which can be further divided into several sub-tasks such as image fill, image extend, object removal, and text rendering, involving a binary mask as the edit area. Although these domain-specific tasks share same conditioning paradigm, they differ significantly in underlying data distributions and evaluation metrics. Existing methods often rely on task-specific supervised fine-tuning (SFT), which limits generalization and training efficiency. Building on OneReward, we develop Seedream 3.0 Fill, a mask-guided generation model trained via multi-task reinforcement learning directly on a pre-trained base model, eliminating the need for task-specific SFT. Experimental results demonstrate that our unified edit model consistently outperforms both commercial and open-source competitors, such as Ideogram, Adobe Photoshop, and FLUX Fill [Pro], across multiple evaluation dimensions. Code and model are available at: https://one-reward.github.io

View Paper