GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs

Maxim Zhelnin, Viktor Moskvoretskii, Egor Shvetsov, Egor Venediktov, Mariya Krylova, Aleksandr Zuev, Evgeny Burnaev

2024-09-02

GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs

Summary

This paper talks about GIFT-SW, a new method for fine-tuning large language models (LLMs) that helps improve their performance while using fewer resources.

What's the problem?

Fine-tuning LLMs usually requires a lot of computational power and can lead to a problem called catastrophic forgetting, where the model forgets what it learned previously when it learns new information. This makes it challenging to adapt these models for different tasks without losing their effectiveness.

What's the solution?

GIFT-SW introduces a technique that focuses on updating only the most important parts of the model's weights (called salient weights) while adding some random noise to the less important parts. This helps the model maintain its performance across different tasks without needing to retrain everything from scratch. The authors also created a new way to measure which weights are important, making the process more efficient.

Why it matters?

This research is significant because it allows for better use of LLMs in various applications by making fine-tuning more efficient. By reducing the computational resources needed and preventing forgetting, GIFT-SW can help more people and organizations effectively use advanced AI technology.

Abstract

Parameter Efficient Fine-Tuning (PEFT) methods have gained popularity and democratized the usage of Large Language Models (LLMs). Recent studies have shown that a small subset of weights significantly impacts performance. Based on this observation, we introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW). Our method updates only salient columns, while injecting Gaussian noise into non-salient ones. To identify these columns, we developeda generalized sensitivity metric that extends and unifies metrics from previous studies. Experiments with LLaMA models demonstrate that GIFT-SW outperforms full fine-tuning and modern PEFT methods under the same computational budget. Moreover, GIFT-SW offers practical advantages to recover performance of models subjected to mixed-precision quantization with keeping salient weights in full precision.

View Paper