Densing Law of LLMs
Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Xu Han, Zhiyuan Liu, Maosong Sun
2024-12-06

Summary
This paper talks about the 'Densing Law of LLMs,' which introduces a new way to measure the efficiency and effectiveness of large language models (LLMs) as they grow in size, focusing on a concept called 'capacity density.'
What's the problem?
As LLMs become larger and more complex, training and using them efficiently becomes challenging. This growth can lead to high costs and resource demands, making it hard to deploy these models in situations where resources are limited. Additionally, simply increasing the size of a model doesn't always guarantee better performance.
What's the solution?
The authors propose a new metric called 'capacity density' to evaluate LLMs based on their effectiveness relative to their size. They developed a method to predict how well different models perform based on their parameters and introduced a scaling law that shows how capacity density improves over time. Their findings indicate that the capacity density of LLMs doubles approximately every three months, suggesting that future development should focus on improving this metric to achieve better results without needing excessive resources.
Why it matters?
This research is significant because it provides insights into how to develop more efficient and effective language models. By understanding and improving capacity density, researchers can create LLMs that deliver high performance while using fewer resources, making advanced AI technology more accessible and sustainable for various applications.
Abstract
Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable. This paper introduces the concept of ``capacity density'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency. To calculate the capacity density of a given target LLM, we first introduce a set of reference models and develop a scaling law to predict the downstream performance of these reference models based on their parameter sizes. We then define the effective parameter size of the target LLM as the parameter size required by a reference model to achieve equivalent performance, and formalize the capacity density as the ratio of the effective parameter size to the actual parameter size of the target LLM. Capacity density provides a unified framework for assessing both model effectiveness and efficiency. Our further analysis of recent open-source base LLMs reveals an empirical law (the densing law)that the capacity density of LLMs grows exponentially over time. More specifically, using some widely used benchmarks for evaluation, the capacity density of LLMs doubles approximately every three months. The law provides new perspectives to guide future LLM development, emphasizing the importance of improving capacity density to achieve optimal results with minimal computational overhead.