Reinforced Fast Weights with Next-Sequence Prediction
Hee Seung Hwang, Xindi Wu, Sanghyuk Chun, Olga Russakovsky
2026-02-19
Summary
This paper focuses on improving how 'fast weight' models handle long pieces of text, like entire books or articles. These models are a potential alternative to the popular 'transformer' models, but they haven't been performing as well with very long inputs.
What's the problem?
Traditional training methods for fast weight models focus on predicting the *next* word in a sequence. While this works okay, it doesn't teach the model to understand the overall meaning or connections between words that are far apart. Because fast weight models change their internal settings to remember context, this limited training means they struggle to capture those long-range relationships, hindering their ability to process long texts effectively.
What's the solution?
The researchers developed a new training method called REFINE. It uses a technique similar to how we learn through trial and error, called reinforcement learning. REFINE doesn't just predict the next word; it tries to predict entire sequences of words and rewards the model when it does well. It smartly chooses which words to focus on during training and uses a special optimization technique to make the learning process more efficient. This can be applied at different stages of a model's life, even while it's being used.
Why it matters?
REFINE significantly improves the performance of fast weight models on tasks that require understanding long texts, like finding specific information within a large document or answering questions about a lengthy article. This is important because it makes fast weight models a more viable option for dealing with the ever-increasing amount of long-form data we have, and it offers a different approach than the currently dominant transformer models.
Abstract
Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token predictions and ignores semantic coherence across multiple tokens following a prefix. Consequently, fast weight models, which dynamically update their parameters to store contextual information, learn suboptimal representations that fail to capture long-range dependencies. We introduce REFINE (Reinforced Fast weIghts with Next sEquence prediction), a reinforcement learning framework that trains fast weight models under the next-sequence prediction (NSP) objective. REFINE selects informative token positions based on prediction entropy, generates multi-token rollouts, assigns self-supervised sequence-level rewards, and optimizes the model with group relative policy optimization (GRPO). REFINE is applicable throughout the training lifecycle of pre-trained language models: mid-training, post-training, and test-time training. Our experiments on LaCT-760M and DeltaNet-1.3B demonstrate that REFINE consistently outperforms supervised fine-tuning with NTP across needle-in-a-haystack retrieval, long-context question answering, and diverse tasks in LongBench. REFINE provides an effective and versatile framework for improving long-context modeling in fast weight architectures.