DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging
Neha Verma, Kenton Murray, Kevin Duh
2025-07-14
Summary
This paper talks about DOTResize, a new method to make large language models smaller and faster by combining similar neurons instead of just cutting them out.
What's the problem?
Large language models have many neurons that do similar things, causing extra work and slower performance, and traditional pruning methods remove neurons which can lose important information.
What's the solution?
The researchers used a math technique called discrete optimal transport to group and merge neurons based on how they act, allowing the model to keep all useful information while reducing size and speeding up computation.
Why it matters?
This matters because DOTResize makes big AI models easier to run on regular computers without losing their skills, helping more people and devices use advanced AI technology efficiently.
Abstract
DOTResize, a novel Transformer compression method using optimal transport theory, reduces neuron-level redundancies and outperforms pruning techniques in computational efficiency across various large language models.