FLARE: Fast Low-rank Attention Routing Engine

Vedant Puri, Aditya Joglekar, Kevin Ferguson, Yu-hsuan Chen, Yongjie Jessica Zhang, Levent Burak Kara

2025-08-21

FLARE: Fast Low-rank Attention Routing Engine

Summary

This paper presents a new, faster way to do something called self-attention in computer models, especially for complex shapes like those in 3D printing. They call their method FLARE.

What's the problem?

Standard computer models that use self-attention can get really slow and can't handle very large or complicated shapes because the math involved gets too difficult as the shape gets bigger. This limits what these models can do.

What's the solution?

FLARE makes self-attention faster by using a clever trick. Instead of looking at every single tiny part of a big shape, it projects all those parts onto a smaller, fixed-size set of important points. Think of it like summarizing a long book into a few key sentences. This allows the model to communicate information efficiently, reducing the amount of calculation needed and making it work much better with large shapes.

Why it matters?

This is important because it allows computer models to work with much larger and more complex problems than before. It can lead to more accurate predictions and simulations, especially in fields like designing manufactured parts or understanding complex physical systems, and it opens up new possibilities for research by providing a better tool.

Abstract

The quadratic complexity of self-attention limits its applicability and scalability on large unstructured meshes. We introduce Fast Low-rank Attention Routing Engine (FLARE), a linear complexity self-attention mechanism that routes attention through fixed-length latent sequences. Each attention head performs global communication among N tokens by projecting the input sequence onto a fixed length latent sequence of M ll N tokens using learnable query tokens. By routing attention through a bottleneck sequence, FLARE learns a low-rank form of attention that can be applied at O(NM) cost. FLARE not only scales to unprecedented problem sizes, but also delivers superior accuracy compared to state-of-the-art neural PDE surrogates across diverse benchmarks. We also release a new additive manufacturing dataset to spur further research. Our code is available at https://github.com/vpuri3/FLARE.py.

View Paper