Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

Yao Shu, Wenyang Hu, See-Kiong Ng, Bryan Kian Hsiang Low, Fei Richard Yu

2024-09-17

Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

Summary

This paper presents Ferret, a new method for fine-tuning large language models (LLMs) efficiently while keeping data private and reducing communication costs.

What's the problem?

Fine-tuning large language models is important for improving their performance in various tasks, but doing this at scale can be challenging. When data is spread across different locations (federated settings), it’s hard to update the model without sharing sensitive information. Existing methods often reduce the amount of data shared, but this can lead to lower accuracy in the model's performance.

What's the solution?

Ferret addresses these issues by introducing a new approach that allows for full-parameter tuning of LLMs without compromising on accuracy. It uses efficient local updates and reduces the amount of information that needs to be communicated between devices by projecting updates into a smaller space. This means that only a small amount of relevant data needs to be shared, making the process faster and less resource-intensive.

Why it matters?

This research is significant because it enhances the ability to customize large language models for specific tasks while ensuring user privacy and reducing communication costs. As these models become more widely used in applications like chatbots and text generation, tools like Ferret will be crucial for making them more efficient and effective.

Abstract

Large Language Models (LLMs) have become indispensable in numerous real-world applications. Unfortunately, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing methods often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To address these limitations, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (1) it employs widely applied first-order methods for efficient local updates; (2) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (3) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret.

View Paper