首页|Disentangling Polysemantic Channels in Convolutional Neural Networks

Disentangling Polysemantic Channels in Convolutional Neural Networks

来源：

英文摘要

Mechanistic interpretability is concerned with analyzing individual components in a (convolutional) neural network (CNN) and how they form larger circuits representing decision mechanisms. These investigations are challenging since CNNs frequently learn polysemantic channels that encode distinct concepts, making them hard to interpret. To address this, we propose an algorithm to disentangle a specific kind of polysemantic channel into multiple channels, each responding to a single concept. Our approach restructures weights in a CNN, utilizing that different concepts within the same channel exhibit distinct activation patterns in the previous layer. By disentangling these polysemantic features, we enhance the interpretability of CNNs, ultimately improving explanatory techniques such as feature visualizations.

作者：Robin Hesse、Jonas Fischer、Simone Schaub-Meyer、Stefan Roth

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Robin Hesse,Jonas Fischer,Simone Schaub-Meyer,Stefan Roth.Disentangling Polysemantic Channels in Convolutional Neural Networks[EB/OL].(2025-04-17)[2025-05-02].https://arxiv.org/abs/2504.12939.点此复制

Disentangling Polysemantic Channels in Convolutional Neural Networks

Disentangling Polysemantic Channels in Convolutional Neural Networks

评论