Electronic Thesis and Dissertation Repository

Degree

Master of Science

Program

Computer Science

Supervisor

Dr. Lucian Ilie

Abstract

Motivation: High throughput Next Generation Sequencing (NGS) technologies can sequence the genome of a species quickly and cheaply. Errors that are introduced by NGS technologies limit the full potential of the applications that rely on their data. Current techniques used to correct these errors are not sufficient, and a more efficient and accurate program is needed to correct errors.

Results: We have designed and implemented RACER (Rapid Accurate Correction of Errors in Reads), an error correction program that targets the Illumina genome sequencer, which is currently the dominant NGS technology. RACER combines advanced data structures with an intricate analysis of data to achieve high performance. It has been implemented in C++ and OpenMP for parallelization. We have performed extensive testing on a variety of real data sets to compare RACER with the current leading programs. RACER performs better than all the current technologies in time, space, and accuracy. RACER corrects up to twice more errors than all other parallel programs, while being one order of magnitude faster. We hope RACER will become a very useful tool for many applications that use NGS data.

Share

COinS