Emergent properties with repeated examples

François Charton, Julia Kempe

2024-10-13

Emergent properties with repeated examples

Summary

This paper explores how repeating training examples affects the performance of transformer models, specifically in solving mathematical problems.

What's the problem?

In machine learning, there's often a debate about whether it's better to train models on a large variety of examples or to repeat a smaller set of examples. This study looks at how the number of times examples are repeated during training influences the model's ability to learn and perform well on specific tasks, particularly in mathematics.

What's the solution?

The authors conducted experiments using three mathematical problems: finding the greatest common divisor, performing modular multiplication, and calculating matrix eigenvalues. They found that models trained on smaller sets of repeated examples performed better than those trained on larger sets with unique examples, even when the total training time was the same. Additionally, they discovered that using a small random subset of examples repeatedly, while still sampling from the rest of the dataset, led to faster learning and improved performance. This suggests that repetition can be more beneficial than simply having diverse data.

Why it matters?

This research is important because it provides insights into how models learn from data. Understanding the balance between memorization (repeating examples) and generalization (learning from diverse examples) can help improve training strategies for AI systems. This could lead to better performance in various applications, especially in fields requiring precise calculations or problem-solving skills.

Abstract

We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.

View Paper