Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
Lijie Yang, Zhihao Zhang, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, Ravi Netravali
2025-08-12
Summary
This paper talks about Less Is More, a new sparse attention method that helps AI models think more efficiently by focusing only on important parts of the input when solving reasoning problems. It combines information from different local attention parts to decide which tokens to pay attention to, without needing extra training.
What's the problem?
The problem is that AI models struggle with reasoning tasks because paying attention to every piece of input information takes a lot of time and computing power, and can make the models slower and less able to generalize well to new problems.
What's the solution?
Less Is More introduces a training-free sparse attention technique that selects important tokens from local attention heads and aggregates them to focus only on the essential information. This reduces the amount of data the model processes at once, making reasoning faster and helping the model generalize better without the need for costly training.
Why it matters?
This matters because making AI models more efficient and better at generalizing helps them solve complex reasoning tasks faster and with less computational resources. This improvement can make AI more practical and powerful for real-world applications where quick and accurate reasoning is needed.
Abstract
LessIsMore is a training-free sparse attention mechanism that improves efficiency and generalization in reasoning tasks by aggregating token selections from local attention heads.