DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging

Neha Verma, Kenton Murray, Kevin Duh

2025-07-14

DOTResize: Reducing LLM Width via Discrete Optimal Transport-based
Neuron Merging

Summary

This paper talks about DOTResize, a new method to make large language models smaller and faster by combining similar neurons instead of just cutting them out.

What's the problem?

Large language models have many neurons that do similar things, causing extra work and slower performance, and traditional pruning methods remove neurons which can lose important information.

What's the solution?

The researchers used a math technique called discrete optimal transport to group and merge neurons based on how they act, allowing the model to keep all useful information while reducing size and speeding up computation.

Why it matters?

This matters because DOTResize makes big AI models easier to run on regular computers without losing their skills, helping more people and devices use advanced AI technology efficiently.

Abstract

DOTResize, a novel Transformer compression method using optimal transport theory, reduces neuron-level redundancies and outperforms pruning techniques in computational efficiency across various large language models.

View Paper