Resa: Transparent Reasoning Models via SAEs

Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Deqing Fu, Willie Neiswanger

2025-06-15

Resa: Transparent Reasoning Models via SAEs

Summary

This paper talks about Resa, a technique that makes language models better at reasoning by using something called sparse autoencoders (SAEs). These SAEs help find important features in the model's brain that guide reasoning in a clearer way.

What's the problem?

The problem is that improving detailed reasoning in language models usually requires a lot of costly retraining, which takes time and big computer resources. It’s hard to make the models reason better without going through this expensive process.

What's the solution?

The solution was to use SAE-Tuning, which uses sparse autoencoders to find key features in the model and adjust them efficiently. This method can boost the reasoning ability of the model in a cost-effective way without needing to retrain the entire model extensively.

Why it matters?

This matters because it allows building smarter language models that can think and reason more effectively without needing huge amounts of time and resources to retrain. This makes advanced AI more accessible and practical for many uses.

Abstract

SAE-Tuning efficiently elicits strong reasoning in language models by leveraging sparse autoencoders, enabling cost-effective performance gains without extensive retraining.

View Paper