|国家预印本平台
首页|FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

来源:Arxiv_logoArxiv
英文摘要

Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank decomposition methods offer a promising path for structural compression, they often suffer from accuracy degradation, expensive calibration procedures, and result in inefficient model architectures that hinder real-world inference speedups. In this paper, we propose FLAT-LLM, a fast and accurate, training-free structural compression method based on fine-grained low-rank transformations in the activation space. Specifically, we reduce the hidden dimension by transforming the weights using truncated eigenvectors computed via head-wise Principal Component Analysis (PCA), and employ an importance-based metric to adaptively allocate ranks across decoders. FLAT-LLM achieves efficient and effective weight compression without recovery fine-tuning, which could complete the calibration within a few minutes. Evaluated across 4 models and 11 datasets, FLAT-LLM outperforms structural pruning baselines in generalization and downstream performance, while delivering inference speedups over decomposition-based methods.

Jiayi Tian、Ryan Solgi、Jinming Lu、Yifan Yang、Hai Li、Zheng Zhang

计算技术、计算机技术

Jiayi Tian,Ryan Solgi,Jinming Lu,Yifan Yang,Hai Li,Zheng Zhang.FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression[EB/OL].(2025-05-29)[2025-06-19].https://arxiv.org/abs/2505.23966.点此复制

评论