Steering Autoregressive Music Generation with Recursive Feature Machines

Daniel Zhao, Daniel Beaglehole, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack

2025-10-23

Steering Autoregressive Music Generation with Recursive Feature Machines

Summary

This paper introduces a new way to control what music is created by AI, allowing for more precise adjustments to the music without needing to retrain the AI model or introduce unwanted sounds.

What's the problem?

Currently, controlling AI music generation is difficult. Existing methods either require completely retraining the AI model whenever you want a change, or they create noticeable flaws in the music when you try to steer it towards a specific sound. It's hard to get the AI to consistently create music with exactly the qualities you want.

What's the solution?

The researchers developed a system called MusicRFM that figures out how the AI 'thinks' about music internally. It analyzes the AI's internal processes to identify specific 'concept directions' – essentially, what parts of the AI control things like specific notes or chords. Then, they can gently nudge the AI in those directions *while* it's creating music, guiding it to produce the desired sounds without changing the core AI model or causing obvious errors. They also developed ways to change these nudges over time and combine multiple controls at once.

Why it matters?

This is important because it allows musicians and creators to have much more control over AI-generated music. They can fine-tune the music to their exact specifications without the hassle of retraining or dealing with poor audio quality. The improvement in hitting a specific musical note, going from 23% accuracy to 82% accuracy, shows a significant leap forward in controllable music generation, and it does so while still following the original instructions given to the AI.

Abstract

Controllable music generation remains a significant challenge, with existing methods often requiring model retraining or introducing audible artifacts. We introduce MusicRFM, a framework that adapts Recursive Feature Machines (RFMs) to enable fine-grained, interpretable control over frozen, pre-trained music models by directly steering their internal activations. RFMs analyze a model's internal gradients to produce interpretable "concept directions", or specific axes in the activation space that correspond to musical attributes like notes or chords. We first train lightweight RFM probes to discover these directions within MusicGen's hidden states; then, during inference, we inject them back into the model to guide the generation process in real-time without per-step optimization. We present advanced mechanisms for this control, including dynamic, time-varying schedules and methods for the simultaneous enforcement of multiple musical properties. Our method successfully navigates the trade-off between control and generation quality: we can increase the accuracy of generating a target musical note from 0.23 to 0.82, while text prompt adherence remains within approximately 0.02 of the unsteered baseline, demonstrating effective control with minimal impact on prompt fidelity. We release code to encourage further exploration on RFMs in the music domain.

View Paper