GABAC: an arithmetic coding solution for genomic data.

Autores: Voges, J.; Paridaens, T.; Müntefering, F.; Mainzer, L.S.; Bliss, B.; Yang, M.; Ochoa Álvarez, Idoia; Fostier, J.; Ostermann, J.; Hernáez Arrazola, Mikel
Título de la revista: BIOINFORMATICS
ISSN: 1367-4803
Fecha de publicación: 2020
MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the MPEG-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. RESULTS: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. AVAILABILITY: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from SUPPLEMENTARY INFORMATION: Supplementary data, including a complete list of the funding mechanisms and acknowledgements, are available at Bioinformatics online.