Exploring Federated Pruning for Large Language Models
Pengxin Guo, Yinong Wang, Wei Li, Mengting Liu, Ming Li, Jinkai Zheng, Liangqiong Qu
2025-05-21
Summary
This paper talks about FedPrLLM, a new way to make huge language models smaller and faster by letting different users help with the process, all while keeping their private data safe.
What's the problem?
The problem is that large language models are so big that they're hard to run on regular devices, and most methods to shrink them need access to shared data, which isn't possible when that data is private, like in healthcare or finance.
What's the solution?
To solve this, the researchers created a system where each user uses their own private data to figure out which parts of the model aren't needed, then sends just that information (not the actual data) to a central server. The server uses these hints from everyone to shrink the model in a way that works well for all users, without ever seeing anyone's private information.
Why it matters?
This matters because it means powerful language models can be made small enough to run on more devices, and it can be done safely even when the data is sensitive, making advanced AI more available and secure for everyone.
Abstract
FedPrLLM is a federated pruning framework that enables privacy-preserving compression of large language models (LLMs) using local client data.