The model's efficiency is demonstrated in three aspects: a lightweight network with only 899.06M parameters, parameter-efficient training with only 49.57M trainable parameters, and simplified inference requiring less than 8G VRAM for 1024x768 resolution. This results in superior qualitative and quantitative results with fewer prerequisites and trainable parameters than baseline methods.
Here are some key features of CatVTON:
- Lightweight network with 899.06M parameters
- Parameter-efficient training with only 49.57M trainable parameters
- Simplified inference requiring less than 8G VRAM for 1024x768 resolution
- No need for additional network modules, image encoders, or complex preprocessing steps
- Supports seamless transfer of garments of arbitrary categories to target persons
- Achieves realistic try-on effects with high-quality results