Don't Pay Attention

Mohammad Hammoud, Devang Acharya

2025-06-16

Summary

This paper talks about Avey, a new kind of neural network architecture that is different from Transformers. Avey uses a combination of a ranker and an autoregressive processor to focus only on the most relevant parts of a sequence, which allows it to handle really long sequences more effectively than Transformers while still doing well on shorter ones.

What's the problem?

The problem is that Transformers, while very popular, have trouble dealing with very long sequences because their attention mechanism gets slower and requires a lot of memory as the input gets longer. They also can only look at a limited part of the sequence at a time, which makes it hard to capture important information that might be very far apart.

What's the solution?

The solution is Avey’s new design which doesn’t use traditional attention or recurrent methods. Instead, it has a ranker that selects the most important parts of the input, and a neural processor made up of parts that enhance and contextualize the information efficiently. This design lets Avey process sequences on any length without getting slower or running out of memory, even when trained only on shorter sequences.

Why it matters?

This matters because Avey’s ability to handle very long sequences much better than Transformers means it could improve AI systems that need to understand and generate long texts, such as books or long conversations. It can also work faster and be more efficient, which is important for real-time applications like chatbots and other tools that rely on large amounts of information.

Abstract

Avey, a new neural architecture combining a ranker and an autoregressive processor, demonstrates superior performance over the Transformer in processing long-range dependencies and has competitive short-range capabilities.

View Paper