|国家预印本平台
首页|COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down Projection

COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down Projection

COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down Projection

来源:Arxiv_logoArxiv
英文摘要

The growing size of large language models has created significant computational inefficiencies. To address this challenge, sparse activation methods selectively deactivates non-essential parameters during inference, reducing computational costs in FFNN layers. While existing methods focus on non-linear gating mechanisms, we hypothesize that the sparsity of the FFNN layer lies globally in the form of a linear combination over its internal down projection matrix. Based on this insight, we propose two methods: M-COUNTDOWN, leveraging indirect coefficients, and D-COUNTDOWN, utilizing direct coefficients of the linear combination. Experimental results demonstrate that D-COUNTDOWN can omit 90% of computations with performance loss as low as 5.5% ideally, while M-COUNTDOWN provides a predictor-free solution with up to 29.4% better performance preservation compared to existing methods. Our specialized kernel implementations effectively realize these theoretical gains into substantial real-world acceleration.

Jaewon Cheon、Pilsung Kang

计算技术、计算机技术

Jaewon Cheon,Pilsung Kang.COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down Projection[EB/OL].(2025-05-23)[2025-06-29].https://arxiv.org/abs/2505.17701.点此复制

评论