Detalle Publicación

mspack: efficient lossless and lossy mass spectrometry data compression

Título de la revista: BIOINFORMATICS
ISSN: 1367-4803
Volumen: 37
Número: 21
Páginas: 3923 - 3925
Fecha de publicación: 2021
Resumen:
Motivation: Mass spectrometry (MS) data, used for proteomics and metabolomics analyses, have seen considerable growth in the last years. Aiming at reducing the associated storage costs, dedicated compression algorithms for MS data have been proposed, such as MassComp and MSNumpress. However, these algorithms focus on either lossless or lossy compression, respectively, and do not exploit the additional redundancy existing across scans contained in a single file. We introduce mspack, a compression algorithm for MS data that exploits this additional redundancy and that supports both lossless and lossy compression, as well as the mzML and the legacy mzXML formats. mspack applies several preprocessing lossless transforms and optional lossy transforms with a configurable error, followed by the general purpose compressors gzip or bsc to achieve a higher compression ratio. Results: We tested mspack on several datasets generated by commonly used MS instruments. When used with the bsc compression backend, mspack achieves on average 76% smaller file sizes for lossless compression and 94% smaller file sizes for lossy compression, as compared with the original files. Lossless mspack achieves 10-60% lower file sizes than MassComp, and lossy mspack compresses 36-60% better than the lossy MSNumpress, for the same error, while exhibiting comparable accuracy and running time. Supplementary information: Supplementary data are available at Bioinformatics online.