Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Zayd M. K. Zuhri, Erland Hilman Fuadi, Alham Fikri Aji

2025-05-01

Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax

Summary

This paper talks about Softpick, a new way to help AI models focus their attention better and use less computer power by changing how they process information inside their networks.

What's the problem?

The usual method, called softmax, can cause problems like making some parts of the AI network do nothing (attention sinks) or overloading others, which makes the model less efficient and harder to run on smaller devices.

What's the solution?

The researchers created Softpick, which replaces softmax in the AI's attention system. This makes the model work more smoothly, prevents wasted effort, and helps the AI use less memory and power, especially when using simpler or cheaper hardware.

Why it matters?

This matters because it means AI can run faster and more efficiently, even on devices that aren't super powerful, making advanced technology more accessible and useful for everyone.

Abstract

Softpick is a drop-in replacement for softmax in transformer attention mechanisms that improves performance, reduces activation sinks, and achieves sparsity, particularly beneficial for quantization and low-precision training.

View Paper