IMAGDressing-v1: Customizable Virtual Dressing
Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinghui Tang
2024-07-18
Summary
This paper introduces IMAGDressing-v1, a new system for customizable virtual dressing that allows users to create and edit realistic images of people wearing different clothes in various settings.
What's the problem?
While recent technologies have made it possible to try on clothes virtually, they often lack the ability for merchants to showcase garments in a flexible way. This means that users cannot easily control how clothes are displayed, such as changing poses, backgrounds, or even the faces of the models. Existing systems also struggle to provide comprehensive garment representations, limiting their effectiveness for online shopping.
What's the solution?
To solve these issues, the authors developed IMAGDressing-v1, which focuses on generating editable images of humans wearing fixed garments while allowing users to customize various conditions. They created a special garment UNet that captures important features of clothing and combined it with a hybrid attention module to enhance user control over different scenes through text prompts. Additionally, they released a large dataset called IGPair, containing over 300,000 pairs of clothing and dressed images to support the system's training and performance.
Why it matters?
This research is important because it enhances the online shopping experience by allowing more realistic and customizable virtual try-ons. By enabling merchants to showcase their products more effectively and giving users greater control over how they see clothing on models, IMAGDressing-v1 can lead to better purchasing decisions and improve customer satisfaction in e-commerce.
Abstract
Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we define a virtual dressing (VD) task focused on generating freely editable human images with fixed garments and optional conditions. Meanwhile, we design a comprehensive affinity metric index (CAMI) to evaluate the consistency between generated images and reference garments. Then, we propose IMAGDressing-v1, which incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet, ensuring users can control different scenes through text. IMAGDressing-v1 can be combined with other extension plugins, such as ControlNet and IP-Adapter, to enhance the diversity and controllability of generated images. Furthermore, to address the lack of data, we release the interactive garment pairing (IGPair) dataset, containing over 300,000 pairs of clothing and dressed images, and establish a standard pipeline for data assembly. Extensive experiments demonstrate that our IMAGDressing-v1 achieves state-of-the-art human image synthesis performance under various controlled conditions. The code and model will be available at https://github.com/muzishen/IMAGDressing.