Date of Award
2011
Degree Type
Thesis
Degree Name
Master of Science
Program
Computer Science
Supervisor
Dr. Lucian Ilie
Second Advisor
Dr. Silvana Ilie
Abstract
The main goal of homology search is to find similar segments, or local alignments, be tween two DNA or protein sequences. Since the dynamic programming algorithm of Smith- Waterman is too slow, heuristic methods have been designed to achieve both efficiency and accuracy. Seed-based methods were made well known by their use in BLAST, the most widely used software program in biological applications. The seed of BLAST trades sensitivity for speed and spaced seeds were introduced in PatternHunter to achieve both. Several seeds are better than one and near perfect sensitivity can be obtained while maintaining the speed. There fore, multiple spaced seeds quickly became the state-of-the-art in similarity search, being em ployed by many software programs. However, the quality of these seeds is crucial and comput ing optimal multiple spaced seeds is NP-hard. All but one of the existing heuristic algorithms for computing good seeds are exponential. Our work has two main goals. First we engineer the only existing polynomial-time heuristic algorithm to compute better seeds than any other program, while running orders of magnitude faster. Second, we estimate its performance by comparing its seeds with the optimal seeds in a few practical cases. In order to make the computation feasible, a very fast implementation of the sensitivity function is provided.
Recommended Citation
Bigvand, Anahita Mansouri, "A FAST ALGORITHM FOR COMPUTING HIGHLY SENSITIVE MULTIPLE SPACED SEEDS" (2011). Digitized Theses. 3662.
https://ir.lib.uwo.ca/digitizedtheses/3662