CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
Quang-Binh Nguyen, Minh Luu, Quang Nguyen, Anh Tran, Khoi Nguyen
2025-07-21
Summary
This paper talks about CSD-VAR, a type of visual autoregressive model that improves how AI separates the content and style of images, leading to better preservation of details and more flexible stylizing.
What's the problem?
The problem is that many current models struggle to clearly separate what is actually in the image (content) from how it looks (style), which can cause changes in style to mess up the content or lose important details.
What's the solution?
The authors introduced techniques like scale-aware optimization that adjusts for different sizes of image features, SVD-based rectification to fix errors that blur content and style boundaries, and an augmented key-value memory mechanism to better remember and keep content intact while changing style.
Why it matters?
This matters because it allows AI to generate images with accurate content and customized styles, outperforming other models in keeping image details and offering more creative control.
Abstract
CSD-VAR, a Visual Autoregressive Modeling approach, enhances content-style decomposition by introducing scale-aware optimization, SVD-based rectification, and augmented K-V memory, outperforming diffusion models in content preservation and stylization.