OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

Yaoli Liu, Ziheng Ouyang, Shengtao Lou, Yiren Song

2025-12-01

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

Summary

This paper focuses on improving how well AI can edit images based on a reference image, specifically making sure the details in the edited image match the reference image closely.

What's the problem?

Current AI image generators, even the advanced ones, have trouble keeping small details consistent when you're trying to edit an image to look like another one. This happens because the way these models compress images throws away some of the fine textures and details. When you try to add details back in later, it can mess up the lighting, textures, and even the shapes in the image, making it look unnatural.

What's the solution?

The researchers created a new system that refines images in two steps. First, they trained an existing image editor to consider both the original image and the reference image at the same time, ensuring the overall structure stays correct. Then, they used a technique called reinforcement learning to specifically improve the AI's ability to accurately add details and make sure everything looks consistent with the reference image.

Why it matters?

This research is important because it makes AI image editing much more accurate and realistic. It allows for better restoration of old or damaged photos, and more precise control over how images are modified, surpassing the quality of current commercially available tools and open-source options.

Abstract

Reference-guided image generation has progressed rapidly, yet current diffusion models still struggle to preserve fine-grained visual details when refining a generated image using a reference. This limitation arises because VAE-based latent compression inherently discards subtle texture information, causing identity- and attribute-specific cues to vanish. Moreover, post-editing approaches that amplify local details based on existing methods often produce results inconsistent with the original image in terms of lighting, texture, or shape. To address this, we introduce , a detail-aware refinement framework that performs two consecutive stages of reference-driven correction to enhance pixel-level consistency. We first adapt a single-image diffusion editor by fine-tuning it to jointly ingest the draft image and the reference image, enabling globally coherent refinement while maintaining structural fidelity. We then apply reinforcement learning to further strengthen localized editing capability, explicitly optimizing for detail accuracy and semantic consistency. Extensive experiments demonstrate that significantly improves reference alignment and fine-grained detail preservation, producing faithful and visually coherent edits that surpass both open-source and commercial models on challenging reference-guided restoration benchmarks.

View Paper