Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
Tsai-Shien Chen, Aliaksandr Siarohin, Guocheng Gordon Qian, Kuan-Chieh Jackson Wang, Egor Nemchinov, Moayed Haji-Ali, Riza Alp Guler, Willi Menapace, Ivan Skorokhodov, Anil Kag, Jun-Yan Zhu, Sergey Tulyakov
2025-12-12
Summary
This paper introduces a new way to change specific parts of an image, like a person's expression or the lighting, without messing up the rest of the picture. It focuses on being able to modify images based on descriptions of what you want to change, even if those descriptions aren't limited to a specific set of options.
What's the problem?
Current methods for changing image attributes aren't very precise. They treat images as a whole, making it hard to isolate and change just one thing without unintentionally altering other parts. Imagine trying to change someone's smile without changing their hair color – existing techniques often struggle with that kind of specific editing, leading to blurry or unrealistic results because everything is interconnected in how the computer 'sees' the image.
What's the solution?
The researchers created something called 'Omni-Attribute,' a new system designed to understand and separate different visual attributes. They did this in two main ways: first, they created a dataset of images specifically designed to teach the system which parts of an image relate to which attributes. Second, they trained the system using a method that encourages it to both create realistic images *and* keep different attributes separate from each other, preventing unwanted changes. This allows for much more controlled and accurate image editing.
Why it matters?
This work is important because it allows for more realistic and flexible image personalization. Instead of being limited to pre-defined changes, you can now describe what you want to change in an image, and the system will attempt to do it accurately. This has potential applications in areas like creating personalized avatars, editing photos with precise control, and generating images based on complex descriptions.
Abstract
Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on holistic embeddings from general-purpose image encoders, which entangle multiple visual factors and make it difficult to isolate a single attribute. This often leads to information leakage and incoherent synthesis. To address this limitation, we introduce Omni-Attribute, the first open-vocabulary image attribute encoder designed to learn high-fidelity, attribute-specific representations. Our approach jointly designs the data and model: (i) we curate semantically linked image pairs annotated with positive and negative attributes to explicitly teach the encoder what to preserve or suppress; and (ii) we adopt a dual-objective training paradigm that balances generative fidelity with contrastive disentanglement. The resulting embeddings prove effective for open-vocabulary attribute retrieval, personalization, and compositional generation, achieving state-of-the-art performance across multiple benchmarks.