Teach Old SAEs New Domain Tricks with Boosting
Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev, Gleb Gerasimov, Nikita Balagansky, Daniil Gavrilov
2025-07-18
Summary
This paper talks about a new way to improve Sparse Autoencoders (SAEs) so they can learn features specific to new domains or types of data without needing to be completely retrained.
What's the problem?
The problem is that regular Sparse Autoencoders often fail to capture important details unique to certain specialized fields because they were trained on more general data.
What's the solution?
The authors introduced a residual learning approach that trains a secondary SAE to model the errors made by the original SAE on new domain data. By combining the outputs of both models, they can better capture domain-specific features while keeping the original model's strengths.
Why it matters?
This matters because it allows AI systems to adapt more quickly and effectively to new tasks or areas without expensive retraining, making them more useful and interpretable for specialized applications.
Abstract
A residual learning approach enhances Sparse Autoencoders to capture domain-specific features without retraining, improving interpretability and performance on specialized domains.