mzMLb: a future-proof raw mass spectrometry data format based on standards-compliant mzML and optimized for speed and storage requirements
Deutsch Eric W 1Jones Andrew R 2Dowsey Andrew W 3Jankevics Andris 4Bhamber Ranjeet S.3
作者信息
- 1. Institute for Systems Biology
- 2. Institute of Integrative Biology, University of Liverpool
- 3. Department of Population Health Sciences and Bristol Veterinary School, University of Bristol BS8 2BN
- 4. School of Biosciences and Phenome Centre Birmingham, University of Birmingham
- 折叠
Abstract
Abstract
With ever-increasing amounts of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer volume of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data analysis pipelines. The Proteomics Standards Initiative (PSI) has established a clear and precise XML representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format ‘mzMLb’ that is optimised for both read/write speed and storage of the raw mass spectrometry data. We provide extensive validation of write speed, random read speed and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression, is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb’s design adheres to both HDF5 and NetCDF4 standard implementations, which allows it to be easily utilised by third parties due to their widespread programming language support. A reference implementation within the established ProteoWizard toolkit is provided.Key words
Proteomics Standards Initiative/mzML/Mass Spectrometry/proteomics/metabolomics/data compression/HDF5引用本文复制引用
Deutsch Eric W,Jones Andrew R,Dowsey Andrew W,Jankevics Andris,Bhamber Ranjeet S..mzMLb: a future-proof raw mass spectrometry data format based on standards-compliant mzML and optimized for speed and storage requirements[EB/OL].(2025-03-28)[2026-04-08].https://www.biorxiv.org/content/10.1101/2020.02.13.947218.学科分类
生物科学研究方法、生物科学研究技术/计算技术、计算机技术
评论