Multi-Token Prediction Needs Registers

Anastasios Gerontopoulos, Spyros Gidaris, Nikos Komodakis

2025-05-19

Summary

This paper talks about MuToR, a new way for language models to predict several words at once by adding special 'register' tokens that help the AI keep track of what it's trying to predict next.

What's the problem?

The problem is that most language models are designed to predict just one word at a time, which can make them less efficient and sometimes less accurate when trying to generate longer pieces of text or understand more complex language patterns.

What's the solution?

The researchers created MuToR, which lets the model use these learnable register tokens to remember and predict multiple words in the future, all without making the model much bigger or changing its basic design.

Why it matters?

This matters because it makes language models faster and smarter at understanding and generating text, which can improve everything from chatbots to translation tools, all while keeping the technology efficient and easy to use.

Abstract

MuToR is a multi-token prediction approach that integrates learnable register tokens to future target prediction, enhancing language model pretraining and fine-tuning without significant parameter increase or architectural changes.

View Paper