Adaptive Computation Pruning for the Forgetting Transformer

Zhixuan Lin, Johan Obando-Ceron, Xu Owen He, Aaron Courville

2025-04-16

Adaptive Computation Pruning for the Forgetting Transformer

Summary

This paper talks about a new technique called Adaptive Computation Pruning (ACP) that makes a special kind of AI model, the Forgetting Transformer (FoX), run much faster and more efficiently without losing its ability to perform well.

What's the problem?

The problem is that AI models like transformers need a lot of computing power, especially when they use something called softmax attention, which can slow down training and make it more expensive. This makes it hard to use these models in situations where speed and cost are important.

What's the solution?

The researchers applied ACP to the Forgetting Transformer, which means the model can automatically skip over unnecessary calculations during training. This reduces the amount of computation needed by about 70% for softmax attention, while still keeping the model's performance just as good. As a result, training becomes 10% to 35% faster.

Why it matters?

This matters because it makes powerful AI models more practical and affordable to use, especially for people or companies who don't have access to huge amounts of computer resources. It also helps save energy and makes AI technology more accessible to everyone.

Abstract

Adaptive Computation Pruning (ACP) applied to Forgetting Transformer (FoX) reduces the number of FLOPs in softmax attention by about 70% without degrading performance, improving training throughput by 10% to 35%.

View Paper