Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan, Zhenyi Lu, Sichen Liu, Xiaoye Qu, Wei Wei, Chengfeng Gu, Yu Cheng
2025-02-25
Summary
This paper talks about GOAT (Great LoRA Mixture-of-Expert), a new method to improve LoRA, which is a technique for efficiently fine-tuning large AI language models.
What's the problem?
LoRA is a popular way to adapt big AI models for specific tasks without changing all their parameters, but it often doesn't work as well as fully retraining the entire model. Current methods to improve LoRA have limitations that prevent them from fully using the AI's pre-existing knowledge or working well with more complex model structures.
What's the solution?
The researchers created GOAT, which does two main things. First, it uses a smart way to pick and combine different parts of the AI's existing knowledge that are most relevant to the new task. Second, it figures out how to adjust the training process so that LoRA works better with a type of AI structure called Mixture-of-Experts, which allows different parts of the AI to specialize in different tasks.
Why it matters?
This matters because it could make it much easier and cheaper to adapt large AI models for specific tasks without losing performance. By closing the gap between LoRA and full model retraining, GOAT could allow more researchers and companies to work with advanced AI models, potentially leading to new applications and improvements in areas like language understanding, reasoning, and image classification.
Abstract
While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with static singular value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge. Another path for improving LoRA is incorporating a Mixture-of-Experts (MoE) architecture. However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture. To mitigate these issues, we propose Great LoRA Mixture-of-Expert (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor. We demonstrate that proper scaling, without modifying the architecture or training algorithms, boosts LoRA MoE's efficiency and performance. Experiments across 25 datasets, including natural language understanding, commonsense reasoning, image classification, and natural language generation, demonstrate GOAT's state-of-the-art performance, closing the gap with Full FT.