Electronic Thesis and Dissertation Repository


Master of Science


Computer Science


Ilie, Lucian


The advent of next-generation sequencing technologies has allowed for the bridging of wet lab work and large data analysis into a cohesive work flow; with the increasing speed and efficiency of sequencing organisms, it becomes imperative that we are able to ensure the data that is produced is correct.

We designed and implemented a new algorithm, QUESS, which based on using multiple spaced seeds to correct DNA sequencing data from Illumina MiSeq, HiSeq and NextSeq machines using C++ and OpenMP for parallel computing. We compared our method with ten leading programs, producing consistently better overall results for most tested measures. QUESS has the best average performance for all programs tested and is also competitive in terms of time and space complexity.