Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Seungyong Lee, Jeong-gi Kwak

2025-08-11

Voost: A Unified and Scalable Diffusion Transformer for Bidirectional
Virtual Try-On and Try-Off

Summary

This paper talks about Voost, a model that uses a diffusion transformer to handle both putting clothes on a person in an image and taking them off, all within one system.

What's the problem?

The problem is that virtual try-on usually focuses only on dressing a person realistically with new clothes, and it’s hard to accurately match the garment to the person’s body, especially with different poses and appearances. Trying to do both dressing and undressing usually needs separate models.

What's the solution?

The solution was to create Voost, which learns to do both virtual try-on and try-off together using one model, improving how the garment and body parts align by letting the garment-person pair 'teach' each other during training. It also uses advanced techniques to make the model more flexible and produce better quality images.

Why it matters?

This matters because it helps improve virtual clothing applications like online shopping or fashion design by making the images more realistic and accurate, and it simplifies the system by combining two tasks into one powerful model.

Abstract

Voost, a unified diffusion transformer framework, jointly learns virtual try-on and try-off, enhancing garment-body correspondence and achieving state-of-the-art results across benchmarks.

View Paper