AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua, Steven Wu, Ge Zhang, Ke Shen
2025-05-13
Summary
This paper talks about AttentionInfluence, a new method that helps pick out the best training data for language models by looking at how different parts of the AI pay attention during learning.
What's the problem?
The problem is that language models need a lot of data to learn, but not all data is equally useful. If the wrong data is chosen, the model might not get as good at understanding or reasoning, which affects its overall performance.
What's the solution?
The researchers created AttentionInfluence, which doesn't need extra training or labeled data. Instead, it uses a technique called attention head masking to figure out which pieces of data are most helpful for the model to learn from. This leads to better data selection and stronger reasoning skills in the final model.
Why it matters?
This matters because it helps make language models smarter and more efficient without needing extra resources. Better data selection means the AI can learn faster and perform better on tasks that require understanding and reasoning, which is important for everything from homework help to advanced research.
Abstract
AttentionInfluence, a training-free, unsupervised method using attention head masking, improves data selection for pretraining LLMs, enhancing their reasoning abilities across various benchmarks.