pLSTM: parallelizable Linear Source Transition Mark networks

Korbinian Pöppel, Richard Freinschlag, Thomas Schmied, Wei Lin, Sepp Hochreiter

2025-06-16

pLSTM: parallelizable Linear Source Transition Mark networks

Summary

This paper talks about pLSTM, a new type of recurrent neural network designed to work efficiently on complex data structures called directed acyclic graphs, or DAGs. It uses special gates named Source, Transition, and Mark to process information in parallel, meaning it can handle data faster and better than some popular models like Transformers, especially on tasks that involve understanding long-range connections.

What's the problem?

The problem is that many AI models that process sequences or multi-dimensional data can be slow or limited because they need to handle data in a fixed order and can struggle with long-distance relationships. Existing recurrent networks have trouble working on complex structures like graphs, and Transformers sometimes fail to generalize well, especially when dealing with very large or detailed data.

What's the solution?

The solution is pLSTM, which extends linear recurrent neural networks to work on DAGs by using the Source, Transition, and Mark gates to manage how information flows between nodes and edges of the graph. This design allows for parallel processing, improving speed and stability for long-range dependencies. The model can operate in two modes for different types of information flow, and it can be efficiently implemented with modern computational techniques.

Why it matters?

This matters because many real-world problems involve data that isn’t just a simple sequence but has complex relationships, like images, molecules, or networks. pLSTM’s ability to handle these structures faster and more accurately than Transformers on certain tasks means it can improve AI performance in areas like computer vision and graph analysis, leading to more powerful and efficient AI systems.

Abstract

pLSTMs are parallelizable linear RNNs designed for DAGs, demonstrating superior performance on long-range tasks and benchmarks compared to Transformers.

View Paper