HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Yichen Liu, Donghao Zhou, Jie Wang, Xin Gao, Guisheng Liu, Jiatong Li, Quanwei Zhang, Qiang Lyu, Lanqing Guo, Shilei Wen, Weiqiang Wang, Pheng-Ann Heng

2026-03-06

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Summary

This paper focuses on creating realistic images that show people interacting with products, like you see in ads or online stores.

What's the problem?

Generating these images is tricky because it's hard to make sure the product in the image looks exactly right and has all its details preserved. Existing methods that use a reference image of the product struggle with a few things: they don't have enough diverse examples to learn from, they have trouble really focusing on the product's details, and they don't get precise enough instructions on *how* to make the product look perfect.

What's the solution?

The researchers developed a new system called HiFi-Inpaint. It uses a special technique called Shared Enhancement Attention to sharpen the product's features and a Detail-Aware Loss to make sure every pixel of the product looks accurate. They also created a new, large dataset of 40,000 images specifically for training this system, combining computer-generated images with automatic filtering to ensure quality.

Why it matters?

This work is important because it allows for the creation of much more realistic and detailed human-product images. This can improve advertising, e-commerce, and digital marketing by making products look more appealing and trustworthy to potential customers.

Abstract

Human-product images, which showcase the integration of humans and products, play a vital role in advertising, e-commerce, and digital marketing. The essential challenge of generating such images lies in ensuring the high-fidelity preservation of product details. Among existing paradigms, reference-based inpainting offers a targeted solution by leveraging product reference images to guide the inpainting process. However, limitations remain in three key aspects: the lack of diverse large-scale training data, the struggle of current models to focus on product detail preservation, and the inability of coarse supervision for achieving precise guidance. To address these issues, we propose HiFi-Inpaint, a novel high-fidelity reference-based inpainting framework tailored for generating human-product images. HiFi-Inpaint introduces Shared Enhancement Attention (SEA) to refine fine-grained product features and Detail-Aware Loss (DAL) to enforce precise pixel-level supervision using high-frequency maps. Additionally, we construct a new dataset, HP-Image-40K, with samples curated from self-synthesis data and processed with automatic filtering. Experimental results show that HiFi-Inpaint achieves state-of-the-art performance, delivering detail-preserving human-product images.

View Paper