Master of Science
Sequence alignment is one of the most accomplished methods in the field of bioinformatics, being crucial to determine similarities between sequences, from finding genes to predicting functions. The computation of Maximal Exact Matches (MEM) plays a fundamental part in some algorithms for sequence alignment. MEMs between a reference-query genome are often utilized as seeds in a genome aligner to increase its efficiency. The MEM computation is a time consuming step in the sequence alignment process and increasing the performance of this step increases significantly the whole process of the alignment between the sequences. As of today, there are many programs available for MEM computing, from algorithms based full text indexes, like essaMEM; to more effective ones, such as E-MEM, copMEM and bfMEM. However, none of the available programs for the computation of MEMs are able to work with highly related sequences. In this study, we propose an improved version, E-MEM2, of the well known MEM computing software, E-MEM. With a trade-off between time and memory, the improved version shows to run faster than its previous version, presenting very large improvements when comparing closely-related sequences.
Summary for Lay Audience
To understand relationships between species and the corresponding differences in their DNA, scientists need to use tools such as sequence aligners that can pinpoint these relationships. In some algorithms for sequence alignments, Maximal Exact Matches (MEM) are used as starting points to increase their efficiency. The process is very time consuming and improving the speed in one of its slowest steps benefits the whole alignment performance speed. There are many programs available for MEM computing, ranging from full text index algorithms, like essaMEM; to more effective ones, such as E-MEM, copMEM and bfMEM. In this study, we present an second and faster version of the known algorithm E-MEM, E-MEM2. E- MEM2 achieves a much better performance when MEMs of highly similar sequences are used, which cannot be accomplished by the other available programs.
Portes de Cerqueira Cesar, Valeria Leticia, "Effiicient Computation of Maximal Exact Matches Between Genomic Sequences" (2020). Electronic Thesis and Dissertation Repository. 6837.