RandLoRA: Full-rank parameter-efficient fine-tuning of large models

Paul Albert, Frederic Z. Zhang, Hemanth Saratchandran, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad

2025-02-04

RandLoRA: Full-rank parameter-efficient fine-tuning of large models

Summary

This paper talks about RandLoRA, a new method for fine-tuning large AI models that combines efficiency with powerful updates. It improves on previous techniques like LoRA by allowing full-rank updates while keeping memory and computational costs low.

What's the problem?

Fine-tuning large AI models is expensive and requires a lot of computational resources, especially when trying to adapt them for specific tasks. Methods like LoRA reduce memory usage by limiting updates to low-rank matrices, but this approach can weaken the model’s ability to handle complex tasks, creating a gap in performance compared to standard fine-tuning.

What's the solution?

The researchers developed RandLoRA, which uses random low-rank matrices combined in a smart way to create full-rank updates without increasing the number of trainable parameters or memory usage. By focusing optimization on diagonal scaling matrices applied to fixed random matrices, RandLoRA overcomes the limitations of low-rank methods while remaining efficient. Experiments showed that RandLoRA performs better than LoRA across tasks involving vision, language, and vision-language processing, often closing the performance gap with standard fine-tuning.

Why it matters?

This research is important because it makes fine-tuning large AI models more accessible and effective for complex tasks. RandLoRA provides a way to achieve high performance without requiring expensive hardware or excessive memory, making advanced AI tools more practical for real-world applications in areas like image analysis, language understanding, and multimodal tasks.

Abstract

Low-Rank Adaptation (LoRA) and its variants have shown impressive results in reducing the number of trainable parameters and memory requirements of large transformer networks while maintaining fine-tuning performance. However, the low-rank nature of the weight update inherently limits the representation power of fine-tuned models, potentially compromising performance on complex tasks. This raises a critical question: when a <PRE_TAG>performance gap</POST_TAG> between LoRA and <PRE_TAG>standard <PRE_TAG>fine-tuning</POST_TAG></POST_TAG> is observed, is it due to the reduced number of trainable parameters or the rank deficiency? This paper aims to answer this question by introducing <PRE_TAG><PRE_TAG>RandLoRA</POST_TAG></POST_TAG>, a parameter-efficient method that performs full-rank updates using a learned linear combinations of low-rank, non-trainable random matrices. Our method limits the number of trainable parameters by restricting optimization to diagonal scaling matrices applied to the fixed random matrices. This allows us to effectively overcome the low-rank limitations while maintaining parameter and memory efficiency during training. Through extensive experimentation across vision, language, and <PRE_TAG>vision-language benchmarks</POST_TAG>, we systematically evaluate the limitations of LoRA and existing random basis methods. Our findings reveal that full-rank updates are beneficial across vision and language tasks individually, and even more so for vision-language tasks, where <PRE_TAG><PRE_TAG>RandLoRA</POST_TAG></POST_TAG> significantly reduces -- and sometimes eliminates -- the <PRE_TAG>performance gap</POST_TAG> between <PRE_TAG>standard <PRE_TAG>fine-tuning</POST_TAG></POST_TAG> and LoRA, demonstrating its efficacy.

View Paper