Thesis Format

Monograph

Computation of Sensitive Multiple Spaced Seeds

Arnab Mallik, The University of Western OntarioFollow

Degree

Master of Science

Program

Computer Science

Supervisor

Ilie, Lucian

Abstract

Similarity search is one of the most important problem in bioinformatics, with application in read mapping, homology search, oligonucleotide design, etc. Similarity search is time and memory intensive, hence heuristic methods using multiple spaced seeds are commonly employed. A spaced seed is a string of 1 and *, where 1 represents a match position and * represent don't care position. Seeds are used to discover regions with identity, thus, it is imperative to design seeds of high sensitivity, so as to maximize the number of hits.

We present SpEED2, a software program to generate multiple spaced seeds of high sensitivity. It uses a novel seed optimization approach and it outperforms all the leading programs used for designing multiple spaced seeds like Iedera, AcoSeeD, and rasbhari. Our algorithm will benefit several software that is dependent on good quality seeds for its operation like PatternHunter for similarity search, SHRiMP and BFAST for read mapping, bestPrimer for designing primers, and many more.

Summary for Lay Audience

Multiple spaced seed is a set of one or more spaced seeds and they are used to find similar regions between two biological sequences. A spaced seed is a string of 1 and *, where 1 represents a match position and * represent don't care position. Two sequences are similar if they are highly identical. When a spaced seed is arranged with the two biological sequences to identify regions of similarity, a hit occurs only if both the sequences are identical in all the match positions. However, we are not interested in the don't care positions. A measurement metric called sensitivity is used to differentiate good seeds from bad ones. Sensitivity is the probability of detecting similar regions between sequences, good multiple spaced seeds have high sensitivity whereas bad ones will have low sensitivity.

We have developed a software program named SpEED2 that generates multiple spaced seeds of high sensitivity. In this dissertation, we first look into previous related works, then we explain our algorithm and finally compare our results with other leading software concerned with generating multiple spaced seeds and show our software program to be better than the others.

Recommended Citation

Mallik, Arnab, "Computation of Sensitive Multiple Spaced Seeds" (2020). Electronic Thesis and Dissertation Repository. 6977.
https://ir.lib.uwo.ca/etd/6977

Download

COinS

Thesis Format

Computation of Sensitive Multiple Spaced Seeds

Degree

Program

Supervisor

Abstract

Summary for Lay Audience

Recommended Citation

Links

Browse

Author Corner

Links

Thesis Format

Computation of Sensitive Multiple Spaced Seeds

Author

Degree

Program

Supervisor

Abstract

Summary for Lay Audience

Recommended Citation

Share

Links

Browse

Author Corner

Links