Configurable Foundation Models: Building LLMs from a Modular Perspective

Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang

2024-09-09

Configurable Foundation Models: Building LLMs from a Modular Perspective

Summary

This paper talks about a new way to build large language models (LLMs) using a modular approach, which allows for more flexibility and efficiency in how these models operate.

What's the problem?

As LLMs have become more advanced, they require a lot of computational power and resources, making it difficult to use them on devices with limited capabilities. Additionally, current models often struggle to adapt to different tasks without needing extensive retraining, which is inefficient.

What's the solution?

The authors propose a modular approach where LLMs are broken down into smaller functional units called 'bricks.' This allows parts of the model to be used independently or combined in different ways to handle various tasks. They introduce concepts like 'emergent bricks' that develop during training and 'customized bricks' that can be added later. The paper also discusses operations that can be performed on these bricks, such as merging and updating them, to create a more adaptable LLM system.

Why it matters?

This research is important because it offers a new perspective on building and using LLMs that could lead to more efficient models. By allowing for modular construction, developers can create LLMs that are easier to scale and customize for specific applications, making AI technology more accessible and versatile.

Abstract

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.

View Paper