首页|自适应精度保留的浮点数时间序列无损压缩算法研究

自适应精度保留的浮点数时间序列无损压缩算法研究

来源：

中文摘要

英文摘要

（现有的浮点数时间序列无损压缩算法主要分为两类：基于异或操作的压缩算法和基于十进制的压缩算法。基于异或操作的压缩算法通过异或运算去除相邻两个浮点数在 IEEE 754 表示中的冗余部分，随后利用编码技术复用异或结果的前导零和中心有效位的长度，从而实现对浮点数的压缩。然而，这种方法在处理两个值相近的浮点数时，其 IEEE 754 表示可能仍存在显著差异，导致实际应用中复用情况较少。因此，该类算法在某些情况下对原始数据的压缩效果不佳，甚至可能无法实现有效的压缩。基于十进制表示的压缩算法通过将二进制浮点数与10的n次方乘，将其转换为相应的整数，以便进行压缩。然而，当小数精度不一致时，该类算法可能会导致较大的存储空间需求。本文提出了一种自适应精度保留的浮点数时间序列无损压缩算法（APCF）。APCF 算法可分为两个主要阶段：预处理阶段和编码阶段。在预处理阶段，算法首先自适应地计算每个浮点数对应的十进制表示的精度，然后根据该精度对浮点数进行量化。通过异或操作，算法去除原始数据中的冗余信息，并将有效信息集中在低位。在编码阶段，算法分别使用游程编码器和改进的异或值编码器对精度值和异或值进行编码。实验结果表明，APCF算法在18个数据集上的平均压缩率达到0.23。与 ALP 算法相比，APCF 算法的平均压缩率提升了 18.2%；与 ELF 算法相比，平均压缩率提升了 19.9%。项目源码见 https://github.com/xiaoYu0103/osptBWT.git.

Existing lossless compression schemes for floating-point time series are fundamentally bifurcated into two methodological paradigms: XOR-oriented techniques and decimal encoding strategies. XOR-based mechanisms mitigate redundancy through bitwise XOR operations applied to successive IEEE 754 floating-point values, utilizing the recurrence patterns of leading-zero lengths and central significant bits. Nevertheless, these techniques demonstrate performance degradation when processing numerically proximate values exhibiting substantial binary representation discrepancies under the IEEE 754 standard, yielding limited practical redundancy exploitation. Such scenarios frequently result in suboptimal compression ratios or even compression inefficacy. Decimal encoding methodologies transform binary floating-point numbers into integer representations via 10^n scaling, though divergent decimal precision requirements typically entail amplified storage overheads. This paper proposes an Adaptive Precision Preservation framework for Lossless Compression of Floating-Point Time Series (APCF). The APCF methodology comprises two principal operational phases: preprocessing and encoding. During the preprocessing phase, the algorithm adaptively identifies optimal decimal precision thresholds, performs adaptive quantization, and consolidates residual information in least significant bits through bitwise XOR transformations. The encoding phase employs run-length encoding (RLE) for precision metadata compression coupled with a modified Gorilla encoder for residual processing. Extensive evaluations across 18 temporal datasets demonstrate APCF\'s superiority, yielding a mean compression ratio of 0.23:1. Comparative analysis with state-of-the-art compressors ALP and ELF reveals relative compression gains of 18.2% and 19.9% respectively. The source code repository is accessible at https://github.com/liuyuxicici/APCF.git.

作者：刘于溪、瞿有利

作者单位：北京交通大学交通大数据与人工智能教育部重点实验室, 北京 100044北京交通大学交通大数据与人工智能教育部重点实验室, 北京 100044

学科分类：计算技术、计算机技术

中文关键词：浮点数时间序列无损压缩自适应精度

英文关键词：Floating-Point Time SeriesLossless Compressiondaptive Precision Preservation

推荐引用：刘于溪,瞿有利.自适应精度保留的浮点数时间序列无损压缩算法研究[EB/OL].(2025-04-23)[2025-04-24].http://www.paper.edu.cn/releasepaper/content/202504-194.点此复制

自适应精度保留的浮点数时间序列无损压缩算法研究

评论