Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Ziyue Li, Chenrui Fan, Tianyi Zhou

2025-06-27

Where to find Grokking in LLM Pretraining? Monitor
Memorization-to-Generalization without Test

Summary

This paper talks about Grokking, a surprising behavior where large language models keep getting better at understanding and generalizing from data even after their training errors stop improving.

What's the problem?

The problem is that usually when AI models finish learning from the training data, their ability to handle new, unseen data does not improve much, making it hard to monitor when they truly start to generalize well.

What's the solution?

The researchers studied Grokking in large language models during their large-scale pretraining and found that different types of data improve at different times. They discovered that the AI moves from just memorizing specific examples to understanding broader patterns by changing how it processes information internally. They developed new ways to track this progress without needing extra tests.

Why it matters?

This matters because understanding when and how AI models start to really understand and generalize helps researchers improve training methods and build better, more reliable AI systems.

Abstract

Grokking, or continued test performance improvement after training loss convergence, is observed during pretraining of a large language model, showcasing a memorization-to-generalization process.

View Paper