RWKV-7 "Goose" with Expressive Dynamic State Evolution

Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Haowen Hou, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng

2025-03-19

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Summary

This paper introduces RWKV-7 'Goose,' a new and improved type of AI model for language processing that's efficient and powerful.

What's the problem?

Existing AI language models often require a lot of computer memory and processing time, especially for long pieces of text. Also, some models struggle with languages other than English.

What's the solution?

RWKV-7 uses a new design that needs only a small, constant amount of memory and processing time, making it faster. It's also been trained on a huge amount of text in many languages, making it good at multilingual tasks. It uses a new way of learning that helps it track information and understand different types of languages.

Why it matters?

This work matters because it offers a more efficient and versatile AI model for language processing, which can be helpful for various applications like translation, text generation, and understanding different languages.

Abstract

We present RWKV-7 "Goose", a new sequence modeling architecture, along with pre-trained language models that establish a new state-of-the-art in downstream performance at the 3 billion parameter scale on multilingual tasks, and match current SoTA English language performance despite being trained on dramatically fewer tokens than other top 3B models. Nevertheless, RWKV-7 models require only constant memory usage and constant inference time per token. RWKV-7 introduces a newly generalized formulation of the delta rule with vector-valued gating and in-context learning rates, as well as a relaxed value replacement rule. We show that RWKV-7 can perform state tracking and recognize all regular languages, while retaining parallelizability of training. This exceeds the capabilities of Transformers under standard complexity conjectures, which are limited to TC^0. To demonstrate RWKV-7's language modeling capability, we also present an extended open source 3.1 trillion token multilingual corpus, and train four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on this dataset. To foster openness, reproduction, and adoption, we release our models and dataset component listing at https://huggingface.co/RWKV, and our training and inference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0 License.

View Paper