QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing

Nom de la revue
F1000Research
Frédéric Jarlier, Nicolas Joly, Nicolas Fedy, Thomas Magalhaes, Leonor Sirotti, Paul Paganiban, Firmin Martin, Michael McManus, Philippe Hupé
Abstract

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data.  Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization.