GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching

Guinan Su, Li Shen, Lu Yin, Shiwei Liu, Yanwu Yang, Jonas Geiping

2025-06-26

GPTailor: Large Language Model Pruning Through Layer Cutting and
Stitching

Summary

This paper talks about GPTailor, a new method that shrinks large language models by cutting and combining layers from multiple fine-tuned versions of the model to create a smaller yet still powerful model.

What's the problem?

The problem is that large language models are really big, which makes them hard to run and expensive to use, and existing methods to shrink them usually focus on just one model and can cause the model to lose important abilities.

What's the solution?

The researchers solved this by treating the shrinking process as an optimization problem over many fine-tuned models, deciding which layers to remove, select from different models, or merge together. By mixing the strengths from different fine-tuned versions, they made a smaller model that keeps most of the original model's performance.

Why it matters?

This matters because it helps make large AI models easier to use in everyday applications by reducing their size and computing needs while still keeping their abilities, allowing more people and devices to benefit from advanced AI.

Abstract

A new strategy merges layers from fine-tuned model variants to compress large language models with minimal performance loss.

View Paper