LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Donald Shenaj, Ondrej Bohdal, Mete Ozay, Pietro Zanuttigh, Umberto Michieli

2024-12-11

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Summary

This paper talks about LoRA.rar, a new method for merging low-rank adaptation parameters (LoRAs) to create personalized images quickly and efficiently.

What's the problem?

Creating personalized images that combine specific subjects and artistic styles can be slow and resource-intensive, especially on devices like smartphones. Previous methods for merging LoRAs required a lot of computing power and time, making them impractical for real-time use.

What's the solution?

The authors introduce LoRA.rar, which speeds up the merging process by over 4000 times. It does this by using a technique called hypernetwork training, where the model learns how to efficiently combine different styles and subjects from a wide range of examples. This allows the model to generate high-quality personalized images quickly without needing extensive computational resources.

Why it matters?

This research is important because it makes personalized image generation more accessible and practical for everyday users. By significantly improving the speed and efficiency of merging styles and subjects, LoRA.rar opens up new possibilities for creative applications in areas like social media, gaming, and digital art.

Abstract

Recent advancements in image generation models have enabled personalized image creation with both user-defined subjects (content) and styles. Prior works achieved personalization by merging corresponding low-rank adaptation parameters (LoRAs) through optimization-based methods, which are computationally demanding and unsuitable for real-time use on resource-constrained devices like smartphones. To address this, we introduce LoRA.rar, a method that not only improves image quality but also achieves a remarkable speedup of over 4000times in the merging process. LoRA.rar pre-trains a hypernetwork on a diverse set of content-style LoRA pairs, learning an efficient merging strategy that generalizes to new, unseen content-style pairs, enabling fast, high-quality personalization. Moreover, we identify limitations in existing evaluation metrics for content-style quality and propose a new protocol using multimodal large language models (MLLM) for more accurate assessment. Our method significantly outperforms the current state of the art in both content and style fidelity, as validated by MLLM assessments and human evaluations.

View Paper