Electronic Thesis and Dissertation Repository

Computation of Sensitive Multiple Spaced Seeds

Arnab Mallik, The University of Western Ontario

Abstract

Similarity search is one of the most important problem in bioinformatics, with application in read mapping, homology search, oligonucleotide design, etc. Similarity search is time and memory intensive, hence heuristic methods using multiple spaced seeds are commonly employed. A spaced seed is a string of 1 and *, where 1 represents a match position and * represent don't care position. Seeds are used to discover regions with identity, thus, it is imperative to design seeds of high sensitivity, so as to maximize the number of hits.

We present SpEED2, a software program to generate multiple spaced seeds of high sensitivity. It uses a novel seed optimization approach and it outperforms all the leading programs used for designing multiple spaced seeds like Iedera, AcoSeeD, and rasbhari. Our algorithm will benefit several software that is dependent on good quality seeds for its operation like PatternHunter for similarity search, SHRiMP and BFAST for read mapping, bestPrimer for designing primers, and many more.