Degree
Doctor of Philosophy
Program
Computer Science
Supervisor
Prof. Lila Kari
Abstract
In an attempt to identify and classify species based on genetic evidence, we propose a novel combination of methods to quantify and visualize the interrelationships between thousand of species. This is possible by using Chaos Game Representation (CGR) of DNA sequences to compute genomic signatures which we then compare by computing pairwise distances. In the last step, the original DNA sequences are embedded in a high dimensional space using Multi-Dimensional Scaling (MDS) before everything is projected on a Euclidean 3D space.
To start with, we apply this method to a mitochondrial DNA dataset from NCBI containing over 3,000 species. The analysis shows that the oligomer composition of full mtDNA sequences can be a source of taxonomic information, suggesting that this method could be used for unclassified species and taxonomic controversies.
Next, we test the hypothesis that CGR-based genomic signature is preserved along a species' genome by comparing inter- and intra-genomic signatures of nuclear DNA sequences from six different organisms, one from each kingdom of life. We also compare six different distances and we assess their performance using statistical measures. Our results support the existence of a genomic signature for a species' genome at the kingdom level.
In addition, we test whether CGR-based genomic signatures originating only from nuclear DNA can be used to distinguish between closely-related species and we answer in the negative. To overcome this limitation, we propose the concept of ``composite signatures'' which combine information from different types of DNA and we show that they can effectively distinguish all closely-related species under consideration. We also propose the concept of ``assembled signatures'' which, among other advantages, do not require a long contiguous DNA sequence but can be built from smaller ones consisting of ~100-300 base pairs.
Finally, we design an interactive webtool MoDMaps3D for building three-dimensional Molecular Distance Maps. The user can explore an already existing map or build his/her own using NCBI's accession numbers as input. MoDMaps3D is platform independent, written in Javascript and can run in all major modern browsers.
Recommended Citation
Karamichalis, Rallis, "Molecular Distance Maps: An alignment-free computational tool for analyzing and visualizing DNA sequences' interrelationships" (2016). Electronic Thesis and Dissertation Repository. 4071.
https://ir.lib.uwo.ca/etd/4071