Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Adam Filipek

2025-10-07

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Summary

This paper introduces a new type of Transformer model, called the Reactive Transformer (RxT), designed to make conversational AI systems, like chatbots, much better at handling long conversations.

What's the problem?

Traditional Transformer models, which power many AI systems today, struggle with long conversations because they have to re-process the *entire* conversation history every time you say something. This takes a lot of computing power and time, making them slow and expensive for extended back-and-forths. The amount of work increases dramatically as the conversation gets longer – specifically, it grows with the square of the conversation length, which is a huge problem.

What's the solution?

The RxT solves this by changing how it handles conversations. Instead of re-reading everything each time, it treats each turn as a separate event and uses a dedicated 'short-term memory' system to keep track of what’s been said. It quickly generates a response and *then* updates its memory in the background. This separation means the model doesn't need to re-process the whole conversation for every response, reducing the computational load from growing with the square of the conversation length to growing linearly with it.

Why it matters?

This is important because it makes truly real-time, stateful conversations possible. 'Stateful' means the AI remembers what you've said previously. By making these conversations faster and cheaper, the RxT opens the door to more practical and engaging AI assistants that can handle complex, extended dialogues without becoming sluggish or prohibitively expensive.

Abstract

The Transformer architecture has become the de facto standard for Large Language Models (LLMs), demonstrating remarkable capabilities in language understanding and generation. However, its application in conversational AI is fundamentally constrained by its stateless nature and the quadratic computational complexity (O(L^2)) with respect to sequence length L. Current models emulate memory by reprocessing an ever-expanding conversation history with each turn, leading to prohibitive costs and latency in long dialogues. This paper introduces the Reactive Transformer (RxT), a novel architecture designed to overcome these limitations by shifting from a data-driven to an event-driven paradigm. RxT processes each conversational turn as a discrete event in real-time, maintaining context in an integrated, fixed-size Short-Term Memory (STM) system. The architecture features a distinct operational cycle where a generator-decoder produces a response based on the current query and the previous memory state, after which a memory-encoder and a dedicated Memory Attention network asynchronously update the STM with a representation of the complete interaction. This design fundamentally alters the scaling dynamics, reducing the total user-facing cost of a conversation from quadratic (O(N^2 cdot T)) to linear (O(N cdot T)) with respect to the number of interactions N. By decoupling response generation from memory updates, RxT achieves low latency, enabling truly real-time, stateful, and economically viable long-form conversations. We validated our architecture with a series of proof-of-concept experiments on synthetic data, demonstrating superior performance and constant-time inference latency compared to a baseline stateless model of comparable size.

View Paper