Sleep-time Compute: Beyond Inference Scaling at Test-time

Kevin Lin, Charlie Snell, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, Joseph E. Gonzalez

2025-04-18

Sleep-time Compute: Beyond Inference Scaling at Test-time

Summary

This paper talks about sleep-time compute, a new idea where AI models do some of their thinking and preparation ahead of time, kind of like doing homework before a test, so they can answer questions faster and more accurately when you actually use them.

What's the problem?

The problem is that large language models usually need a lot of computer power every time they answer a question or solve a problem, which can be slow and expensive, especially for complex reasoning tasks. This makes it hard to use these models efficiently in real-time situations.

What's the solution?

The researchers introduced the concept of sleep-time compute, where the AI does a bunch of off-line calculations and stores useful information before anyone asks it a question. Then, when it's time to actually answer something, the model can use this pre-computed knowledge, making the process much quicker and often more accurate.

Why it matters?

This matters because it allows AI systems to be both faster and smarter without needing massive amounts of computer power every time you use them, making advanced AI more practical and accessible for everyone.

Abstract

Introducing sleep-time compute to performing off-line pre-computations reduces test-time compute requirements and enhances accuracy in reasoning tasks for large language models.

View Paper