LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Enis Simsar, Thomas Hofmann, Federico Tombari, Pinar Yanardag

2024-12-13

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Summary

This paper discusses LoRACLR, a new method for combining multiple diffusion models to create personalized images without losing the distinct characteristics of each concept.

What's the problem?

When creating customized images using different models, combining them can lead to confusion where features from one model interfere with those of another. This makes it hard to maintain the unique qualities of each concept, and existing methods often require separate training for each model, which is inefficient.

What's the solution?

LoRACLR solves this problem by merging multiple LoRA models, each fine-tuned for a specific concept, into a single model without needing additional training. It uses a contrastive objective to align the models' weight spaces, ensuring they work together smoothly while keeping their distinct features intact. This allows for high-quality image generation that incorporates multiple concepts effectively.

Why it matters?

This research is important because it enhances the ability to create personalized images that can represent various ideas and attributes in a coherent way. By improving how models can be combined, LoRACLR opens up new possibilities for applications in art, design, and any field that relies on generating customized visual content.

Abstract

Recent advances in text-to-image customization have enabled high-fidelity, context-rich generation of personalized images, allowing specific concepts to appear in a variety of scenarios. However, current methods struggle with combining multiple personalized models, often leading to attribute entanglement or requiring separate training to preserve concept distinctiveness. We present LoRACLR, a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model without additional individual fine-tuning. LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference. By enforcing distinct yet cohesive representations for each concept, LoRACLR enables efficient, scalable model composition for high-quality, multi-concept image synthesis. Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.

View Paper