OmniTry: Virtual Try-On Anything without Masks

Yutong Feng, Linlin Zhang, Hengyuan Cao, Yiming Chen, Xiaoduan Feng, Jian Cao, Yuxiong Wu, Bin Wang

2025-08-20

OmniTry: Virtual Try-On Anything without Masks

Summary

This paper introduces OmniTry, a new system that lets you virtually try on any kind of wearable item, not just clothes, like jewelry or hats, without needing special masks for the photos, making it more useful in real-world situations.

What's the problem?

It's really hard to get the right kinds of pictures needed to train a system that can virtually try on different accessories. You usually need a picture of a person and then a separate picture of the exact same person wearing the accessory, which is tough to collect for a wide variety of items.

What's the solution?

The researchers came up with a two-step approach. First, they used a massive number of regular photos of people wearing all sorts of accessories. They then trained a model to figure out where to put an accessory on someone's picture, almost like an advanced editing tool. After that, they fine-tuned this model using a smaller set of paired images to make sure the accessory looked realistic and stayed consistent.

Why it matters?

This is important because it makes virtual try-on much more versatile, allowing people to see how different accessories would look on them before buying. It simplifies the process by not requiring special pictures, which means it can be used more easily by businesses and consumers alike.

Abstract

Virtual Try-ON (VTON) is a practical and widely-applied task, for which most of existing works focus on clothes. This paper presents OmniTry, a unified framework that extends VTON beyond garment to encompass any wearable objects, e.g., jewelries and accessories, with mask-free setting for more practical application. When extending to various types of objects, data curation is challenging for obtaining paired images, i.e., the object image and the corresponding try-on result. To tackle this problem, we propose a two-staged pipeline: For the first stage, we leverage large-scale unpaired images, i.e., portraits with any wearable items, to train the model for mask-free localization. Specifically, we repurpose the inpainting model to automatically draw objects in suitable positions given an empty mask. For the second stage, the model is further fine-tuned with paired images to transfer the consistency of object appearance. We observed that the model after the first stage shows quick convergence even with few paired samples. OmniTry is evaluated on a comprehensive benchmark consisting of 12 common classes of wearable objects, with both in-shop and in-the-wild images. Experimental results suggest that OmniTry shows better performance on both object localization and ID-preservation compared with existing methods. The code, model weights, and evaluation benchmark of OmniTry will be made publicly available at https://omnitry.github.io/.

View Paper