NearID: Identity Representation Learning via Near-identity Distractors
Aleksandar Cvejic, Rameen Abdal, Abdelrahman Eldesokey, Bernard Ghanem, Peter Wonka
2026-04-03
Summary
This paper focuses on a weakness in how computers 'see' and understand objects, specifically when trying to personalize things like image generation or editing. Current systems struggle to focus on *what* an object is, getting confused by the background it's in.
What's the problem?
When a computer is asked to identify something, like a specific person in a photo, it often doesn't focus on the person themselves. Instead, it relies on clues from the surrounding environment – the background. This means if you change the background, the computer might get confused and think it's looking at something else, even if it's the same object. This is a big problem for tasks where you want to reliably identify individuals or objects regardless of where they are.
What's the solution?
The researchers created a new way to test and train these computer vision systems. They built a dataset called NearID, which includes many images of the same object (like a person) placed in nearly identical backgrounds. This forces the computer to focus solely on the object's unique features, rather than the background. They then developed a training method that specifically encourages the system to recognize the object even when similar-looking 'distractors' are present in the same scene. This training uses a two-step process, prioritizing recognizing the same object, then distinguishing it from very similar ones, and finally from completely different objects.
Why it matters?
This work is important because it makes personalized AI – like creating images of *you* in different scenarios – much more accurate and reliable. By forcing computers to truly understand object identity, we can build systems that are less easily fooled and better at tasks that require recognizing specific individuals or items, leading to more realistic and useful AI applications.
Abstract
When evaluating identity-focused tasks such as personalized generation and image editing, existing vision encoders entangle object identity with background context, leading to unreliable representations and metrics. We introduce the first principled framework to address this vulnerability using Near-identity (NearID) distractors, where semantically similar but distinct instances are placed on the exact same background as a reference image, eliminating contextual shortcuts and isolating identity as the sole discriminative signal. Based on this principle, we present the NearID dataset (19K identities, 316K matched-context distractors) together with a strict margin-based evaluation protocol. Under this setting, pre-trained encoders perform poorly, achieving Sample Success Rates (SSR), a strict margin-based identity discrimination metric, as low as 30.7% and often ranking distractors above true cross-view matches. We address this by learning identity-aware representations on a frozen backbone using a two-tier contrastive objective enforcing the hierarchy: same identity > NearID distractor > random negative. This improves SSR to 99.2%, enhances part-level discrimination by 28.0%, and yields stronger alignment with human judgments on DreamBench++, a human-aligned benchmark for personalization. Project page: https://gorluxor.github.io/NearID/