UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Yufeng Cheng, Wenxu Wu, Shaojin Wu, Mengqi Huang, Fei Ding, Qian He

2025-09-10

UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Summary

This paper introduces a new framework called UMO that improves how well images can be customized while still accurately representing the person's identity in the image.

What's the problem?

When you try to customize an image, especially faces, it's really hard to make changes without accidentally changing *who* the person looks like. Existing methods struggle to keep a consistent identity when using multiple reference images for customization, leading to confusion about who the person actually is. Basically, the more you change, the easier it is to lose the original person's look.

What's the solution?

The researchers developed UMO, which stands for Unified Multi-identity Optimization. It treats the problem of changing an image with multiple references as a puzzle of finding the best match between different parts of the image and the original identity. They use a technique called reinforcement learning with diffusion models to make sure the identity stays consistent throughout the customization process. To help train UMO, they also created a new dataset of images with multiple references, and a way to measure how much identity confusion is happening.

Why it matters?

This work is important because it significantly improves the quality of image customization, especially when it comes to faces. By reducing identity confusion and improving consistency, UMO allows for more creative and reliable image editing, setting a new standard for open-source customization tools and opening up possibilities for more advanced applications.

Abstract

Recent advancements in image customization exhibit a wide range of application prospects due to stronger customization capabilities. However, since we humans are more sensitive to faces, a significant challenge remains in preserving consistent identity while avoiding identity confusion with multi-reference images, limiting the identity scalability of customization models. To address this, we present UMO, a Unified Multi-identity Optimization framework, designed to maintain high-fidelity identity preservation and alleviate identity confusion with scalability. With "multi-to-multi matching" paradigm, UMO reformulates multi-identity generation as a global assignment optimization problem and unleashes multi-identity consistency for existing image customization methods generally through reinforcement learning on diffusion models. To facilitate the training of UMO, we develop a scalable customization dataset with multi-reference images, consisting of both synthesised and real parts. Additionally, we propose a new metric to measure identity confusion. Extensive experiments demonstrate that UMO not only improves identity consistency significantly, but also reduces identity confusion on several image customization methods, setting a new state-of-the-art among open-source methods along the dimension of identity preserving. Code and model: https://github.com/bytedance/UMO

View Paper