Model Merging in Pre-training of Large Language Models
Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, LingJun Liu, Bole Ma, Xiaoying Jia, Zhou Xun
2025-05-20
Summary
This paper talks about a new way to make large language models smarter and more efficient by combining different models together during their training process.
What's the problem?
The problem is that training really big language models usually takes a lot of time, money, and computer power, and it's hard to get the best results without spending a lot of resources.
What's the solution?
To address this, the researchers found that merging different models while they're still learning can help boost their performance and make the whole training process cheaper and faster, no matter what kind of model or learning rate is used.
Why it matters?
This matters because it means we can build better and more powerful language models without needing as much expensive hardware, making advanced AI tools more available to everyone.
Abstract
Model merging during pre-training enhances large language models, improving performance and reducing costs across various architectures and learning rates.