A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Thomas Schmied, Thomas Adler, Vihang Patil, Maximilian Beck, Korbinian Pöppel, Johannes Brandstetter, Günter Klambauer, Razvan Pascanu, Sepp Hochreiter

2024-10-31

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Summary

This paper introduces xLSTM, a new type of large recurrent action model designed to improve the speed and efficiency of AI systems used in robotics. It combines the strengths of traditional recurrent neural networks with modern techniques to enable fast decision-making in real-time applications.

What's the problem?

Many existing AI models, particularly those based on the Transformer architecture, are powerful but slow when it comes to making decisions in real-time situations like robotics. This slow inference time makes them impractical for tasks that require quick responses, such as navigating or manipulating objects.

What's the solution?

The authors propose a Large Recurrent Action Model (LRAM) that uses xLSTM at its core. This model is designed to process sequential data efficiently, allowing it to make decisions quickly without sacrificing performance. The xLSTM architecture enables linear-time inference complexity, meaning it can handle longer sequences of actions more effectively than traditional models. The researchers tested LRAM on 432 tasks across six different domains and found that it performs well in both speed and accuracy compared to existing models.

Why it matters?

This research is significant because it addresses the need for faster and more efficient AI systems in robotics. By improving how these models work, xLSTM can help robots perform complex tasks more effectively, leading to advancements in automation and intelligent systems that can operate in real-world environments.

Abstract

In recent years, there has been a trend in the field of Reinforcement Learning (RL) towards large action models trained offline on large-scale datasets via sequence modeling. Existing models are primarily based on the Transformer architecture, which result in powerful agents. However, due to slow inference times, Transformer-based approaches are impractical for real-time applications, such as robotics. Recently, modern recurrent architectures, such as xLSTM and Mamba, have been proposed that exhibit parallelization benefits during training similar to the Transformer architecture while offering fast inference. In this work, we study the aptitude of these modern recurrent architectures for large action models. Consequently, we propose a Large Recurrent Action Model (LRAM) with an xLSTM at its core that comes with linear-time inference complexity and natural sequence length extrapolation abilities. Experiments on 432 tasks from 6 domains show that LRAM compares favorably to Transformers in terms of performance and speed.

View Paper