首页|Empirical Evaluation of Progressive Coding for Sparse Autoencoders

Empirical Evaluation of Progressive Coding for Sparse Autoencoders

来源：

英文摘要

Sparse autoencoders (SAEs) \citep{bricken2023monosemanticity,gao2024scalingevaluatingsparseautoencoders} rely on dictionary learning to extract interpretable features from neural networks at scale in an unsupervised manner, with applications to representation engineering and information retrieval. SAEs are, however, computationally expensive \citep{lieberum2024gemmascopeopensparse}, especially when multiple SAEs of different sizes are needed. We show that dictionary importance in vanilla SAEs follows a power law. We compare progressive coding based on subset pruning of SAEs -- to jointly training nested SAEs, or so-called {\em Matryoshka} SAEs \citep{bussmann2024learning,nabeshima2024Matryoshka} -- on a language modeling task. We show Matryoshka SAEs exhibit lower reconstruction loss and recaptured language modeling loss, as well as higher representational similarity. Pruned vanilla SAEs are more interpretable, however. We discuss the origins and implications of this trade-off.

作者：Hans Peter、Anders S?gaard

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Hans Peter,Anders S?gaard.Empirical Evaluation of Progressive Coding for Sparse Autoencoders[EB/OL].(2025-04-30)[2025-07-22].https://arxiv.org/abs/2505.00190.点此复制

Empirical Evaluation of Progressive Coding for Sparse Autoencoders

Empirical Evaluation of Progressive Coding for Sparse Autoencoders

评论