Electronic Thesis and Dissertation Repository

A New Approach to Sequence Local Alignment: Normalization with Concave Functions

Qiang Zhou, The University of Western Ontario

Abstract

Sequence local alignment is to find two subsequences from the input two sequences respectively, which can produce the highest similarity degree among all other pairs of subsequences. The Smith-Waterman algorithm is one of the most important technique in sequence local alignment, especially in computational molecular biology. This algorithm can guarantee that the optimal local alignment can be found with respect to the distance or similarity metric. However, the optimal solution obtained by Smith-Waterman is not biologically meaningful, since it may contain small pieces of irrelevant segments, but as long as they are not strong enough, the algorithm still take them in account as part of the optimal solution.

In this thesis, we propose a new algorithm firstly focusing on finding segment with highest similarity degree rather than similarity score, and save all efficient optimal solutions for any subsequence pairs ending at specific positions for later use. Thereby, there is no need to run the dynamic programming over again if different normalization functions are applied for different applications. Also, several similarity score normalization methods will be introduced and discussed in this thesis. At last, a phylogenetic tree which represents the evolutionary relationships over 20 species is built, based on the data retrieved from our algorithm.