Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Yu Miller, Siddharth Singh, Abhinav Bhatele, Micah Goldblum, Ashwinee Panda, Tom Goldstein

2025-02-12

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Summary

This paper talks about Gemstones, a new collection of AI models designed to study how different factors affect the performance of large language models as they grow in size and complexity. It's like creating a huge set of Lego models with different shapes and sizes to see how they all work together.

What's the problem?

Usually, when researchers study how AI models improve as they get bigger (called scaling laws), they only look at a small range of model types. This is like only studying one type of Lego brick. The problem is that this limited view might not give us accurate predictions about how different types of AI models will perform as they grow.

What's the solution?

The researchers created Gemstones, a huge collection of over 4,000 different AI models. These models vary in size (up to 2 billion parameters), shape (different widths and depths), and how they were trained (different learning rates and schedules). By studying all these different models, they can get a much more complete picture of how AI performance changes with different designs. They've made all this data available for other researchers to use.

Why it matters?

This matters because it helps us understand AI models better, which is crucial as these models become more important in our daily lives. By knowing how different designs affect performance, we can build better, more efficient AI systems. It's like having a complete guidebook for building the best Lego structures, but for AI. This research could lead to smarter, more capable AI that uses less computing power, which is good for both technology advancement and the environment.

Abstract

Scaling laws are typically fit using a family of models with a narrow range of frozen hyper-parameter choices. In this work we study scaling laws using a wide range of architecture and hyper-parameter choices, and highlight their impact on resulting prescriptions. As a primary artifact of our research, we release the Gemstones: the most comprehensive open-source scaling law dataset to date, consisting of over 4000 checkpoints from transformers with up to 2 billion parameters; these models have been trained with different learning rates, cooldown schedules, and architectural shapes. Our checkpoints enable more complex studies of scaling, such as a law that predicts language modeling performance as a function of model width and depth. By examining the various facets of our model suite, we find that the prescriptions of scaling laws can be highly sensitive to the experimental design process and the specific model checkpoints used during fitting. Code: https://github.com/mcleish7/gemstone-scaling-laws

View Paper