Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Ziyue Li, Yang Li, Tianyi Zhou
2025-07-11
Summary
This paper talks about a new way to make large language models (LLMs) smarter and faster at answering questions by changing how deep they go for each specific input during testing.
What's the problem?
Normally, LLMs always use the same number of layers for every question, which can be inefficient because some questions might need less thinking and others more. Also, figuring out the best layer setup is hard without retraining.
What's the solution?
The researchers created a method called chain-of-layers (CoLa) combined with Monte Carlo Tree Search (MCTS) that helps pick the best layer combination for each question while the model is running. This improves how quickly and accurately the model can answer by skipping layers or looping through them as needed.
Why it matters?
This matters because it helps make big language models use less computer power and run faster without losing quality, making advanced AI more practical for everyday use.
Abstract
A method using chain-of-layers (CoLa) and Monte Carlo Tree Search (MCTS) optimizes the architecture of a pretrained large language model for individual samples, improving inference efficiency and performance.