|国家预印本平台
首页|Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data

Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data

Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data

来源:Arxiv_logoArxiv
英文摘要

Compressed Sparse Column (CSC) and Coordinate (COO) are popular compression formats for sparse matrices. However, both CSC and COO are general purpose and cannot take advantage of any of the properties of the data other than sparsity, such as data redundancy. Highly redundant sparse data is common in many machine learning applications, such as genomics, and is often too large for in-core computation using conventional sparse storage formats. In this paper, we present two extensions to CSC: (1) Value-Compressed Sparse Column (VCSC) and (2) Index- and Value-Compressed Sparse Column (IVCSC). VCSC takes advantage of high redundancy within a column to further compress data up to 3-fold over COO and 2.25-fold over CSC, without significant negative impact to performance characteristics. IVCSC extends VCSC by compressing index arrays through delta encoding and byte-packing, achieving a 10-fold decrease in memory usage over COO and 7.5-fold decrease over CSC. Our benchmarks on simulated and real data show that VCSC and IVCSC can be read in compressed form with little added computational cost. These two novel compression formats offer a broadly useful solution to encoding and reading redundant sparse data.

Skyler Ruiter、Seth Wolfgang、Marc Tunnell、Timothy Triche、Erin Carrier、Zachary DeBruine

10.1109/BigData62323.2024.10825091

计算技术、计算机技术生物科学研究方法、生物科学研究技术

Skyler Ruiter,Seth Wolfgang,Marc Tunnell,Timothy Triche,Erin Carrier,Zachary DeBruine.Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data[EB/OL].(2025-06-30)[2025-07-16].https://arxiv.org/abs/2309.04355.点此复制

评论