Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Hailong Guo, Bohan Zeng, Yiren Song, Wentao Zhang, Chuang Zhang, Jiaming Liu

2025-01-30

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Summary

This paper talks about a new AI system called Any2AnyTryon that makes virtual clothing try-ons more flexible and realistic. It's like having a super smart digital fitting room that can put any piece of clothing on any person's photo, making online shopping easier and more fun.

What's the problem?

Current virtual try-on systems have two main issues. First, they don't have enough paired photos of people wearing specific clothes, which makes it hard for the AI to learn and create realistic results. Second, most systems can only do specific types of try-ons and aren't very user-friendly, often requiring special masks or poses to work properly.

What's the solution?

The researchers created Any2AnyTryon to solve these problems. They first made a huge dataset called LAION-Garment with lots of clothing try-on examples. Then, they developed a clever technique called 'adaptive position embedding' that helps the AI understand and work with images of different sizes and types. This allows Any2AnyTryon to put clothes on people's photos without needing special masks or poses, and it can even create new clothing images based on text descriptions.

Why it matters?

This matters because it could make online shopping for clothes much easier and more enjoyable. Imagine being able to see exactly how any piece of clothing would look on you just by uploading a photo, or even creating new outfits just by describing them. It could reduce the number of returns in online shopping, help people feel more confident in their purchases, and maybe even spark new trends in fashion design. Plus, the technology behind Any2AnyTryon could be used in other areas where we need AI to understand and work with images in flexible ways.

Abstract

Image-based virtual try-on (VTON) aims to generate a virtual try-on result by transferring an input garment onto a target person's image. However, the scarcity of paired garment-model data makes it challenging for existing methods to achieve high generalization and quality in VTON. Also, it limits the ability to generate mask-free try-ons. To tackle the data scarcity problem, approaches such as Stable Garment and MMTryon use a synthetic data strategy, effectively increasing the amount of paired data on the model side. However, existing methods are typically limited to performing specific try-on tasks and lack user-friendliness. To enhance the generalization and controllability of VTON generation, we propose Any2AnyTryon, which can generate try-on results based on different textual instructions and model garment images to meet various needs, eliminating the reliance on masks, poses, or other conditions. Specifically, we first construct the virtual try-on dataset LAION-Garment, the largest known open-source garment try-on dataset. Then, we introduce adaptive position embedding, which enables the model to generate satisfactory outfitted model images or garment images based on input images of different sizes and categories, significantly enhancing the generalization and controllability of VTON generation. In our experiments, we demonstrate the effectiveness of our Any2AnyTryon and compare it with existing methods. The results show that Any2AnyTryon enables flexible, controllable, and high-quality image-based virtual try-on generation.https://logn-2024.github.io/Any2anyTryonProjectPage/

View Paper