iDoComp: a compression scheme for assembled genomes

Autores: Ochoa Álvarez, Idoia (Autor de correspondencia); Hernáez Arrazola, Mikel; Weissman, T.
Título de la revista: BIOINFORMATICS
ISSN: 1367-4803
Volumen: 31
Número: 5
Páginas: 626 - 633
Fecha de publicación: 2015
Motivation:With the release of the latest next-generation sequencing (NGS) machine, the HiSeq Xby Illumina, the cost of sequencing a Human has dropped to a mere $4000. Thus we are approach-ing a milestone in the sequencing history, known as the $1000 genome era, where the sequencingof individuals is affordable, opening the doors to effective personalized medicine. Massive gener-ation of genomic data, including assembled genomes, is expected in the following years. There iscrucial need for compression of genomes guaranteed of performing well simultaneously on differ-ent species, from simple bacteria to humans, which will ease their transmission, dissemination andanalysis. Further, most of the new genomes to be compressed will correspond to individuals ofa species from which a reference already exists on the database. Thus, it is natural to proposecompression schemes that assume and exploit the availability of such references.Results:We propose iDoComp, a compressor of assembled genomes presented in FASTA formatthat compresses an individual genome using a reference genome for both the compression and thedecompression. In terms of compression efficiency, iDoComp outperforms previously proposed al-gorithms in most of the studied cases, with comparable or better running time. For example, we ob-serve compression gains of up to 60% in several cases, includingH.sapiensdata, when comparingwith the best compression performance among the previously proposed algorithms.