Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

A. Bochkov

2025-07-11

Growing Transformers: Modular Composition and Layer-wise Expansion on a
Frozen Substrate

Summary

This paper talks about a new way to grow transformer AI models by adding layers and combining modules while keeping the core part fixed, which helps the models get better at reasoning tasks.

What's the problem?

Usually, when transformer models get bigger by adding layers, they can forget what they learned before, known as catastrophic forgetting, and training large models can be inefficient and expensive.

What's the solution?

The researchers developed a method to freeze the initial parts of the model called embeddings and gradually add new layers and modules in a structured way, allowing the model to expand and improve without losing previous knowledge.

Why it matters?

This matters because it allows building larger and smarter AI models that can keep learning new things efficiently while remembering what they already know, making them more reliable and effective for complex tasks.

Abstract

Transformers with frozen embeddings enable efficient scaling through modular composition and layer-wise growth, improving performance on reasoning tasks without catastrophic forgetting.

View Paper