Linear Attention Modeling for Learned Image Compression
Linear Attention Modeling for Learned Image Compression
Recent years, learned image compression has made tremendous progress to achieve impressive coding efficiency. Its coding gain mainly comes from non-linear neural network-based transform and learnable entropy modeling. However, most studies focus on a strong backbone, and few studies consider a low complexity design. In this paper, we propose LALIC, a linear attention modeling for learned image compression. Specially, we propose to use Bi-RWKV blocks, by utilizing the Spatial Mix and Channel Mix modules to achieve more compact feature extraction, and apply the Conv based Omni-Shift module to adapt to two-dimensional latent representation. Furthermore, we propose a RWKV-based Spatial-Channel ConTeXt model (RWKV-SCCTX), that leverages the Bi-RWKV to modeling the correlation between neighboring features effectively. To our knowledge, our work is the first work to utilize efficient Bi-RWKV models with linear attention for learned image compression. Experimental results demonstrate that our method achieves competitive RD performances by outperforming VTM-9.1 by -15.26%, -15.41%, -17.63% in BD-rate on Kodak, CLIC and Tecnick datasets. The code is available at https://github.com/sjtu-medialab/RwkvCompress .
Shen Wang、Ronghua Wu、Zhengxue Cheng、Donghui Feng、Guo Lu、Hongwei Hu、Li Song
计算技术、计算机技术
Shen Wang,Ronghua Wu,Zhengxue Cheng,Donghui Feng,Guo Lu,Hongwei Hu,Li Song.Linear Attention Modeling for Learned Image Compression[EB/OL].(2025-02-08)[2025-05-18].https://arxiv.org/abs/2502.05741.点此复制
评论