Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article


Doctor of Philosophy




Gloor, Gregory B.


An organism's genome is the ultimate determinant of its functional potential. Understanding genomes is therefore essential to understand function, and a foundational knowledge of a genome is required transfer functions to and from microorganisms of interest. Sequencing DNA using nanopores is a recent advance that resolves limitations of previous technologies, enabling an improved understanding of genomes. For this thesis, I improved our understanding of microbial genomes by developing novel approaches to analyze long read sequencing data, setting the foundation for future synthetic biology work.

Long sequencing reads have enabled routine assembly of complete bacterial genomes by directly sequencing DNA extracted from bacterial communities. I showed that visualizing sequencing coverage after filtering read alignments using a 95\% query coverage cutoff (i.e., the entire read aligns to the genome) enabled the detection of mis-assemblies. I also showed it can be applied to detect recoverable alternate haplotypes containing important functional elements. Furthermore, I used this approach to demonstrate that a circular genome for a novel species of Saccharibacteria, enriched from a heavy-metal polluted Northern Albertan tailings pond, contains a recently acquired genomic island. I also determined this genomic island encodes heavy metal-resistance genes, suggesting that horizontal gene transfer may be possible under selective pressure in Saccharibacteria.

Another track of my thesis focused on applying nanopore sequencing on a marine diatom, Phaeodactylum tricornutum, which has significant interest for synthetic biology applications like producing low-cost glycosylated proteins. This species does not have a complete genome assembly, despite a draft sequence being available since 2008. To determine the full structure of the genome, I used ultra-long sequencing reads to build a telomere-to-telomere genome assembly. I also developed a novel, assembly-free approach to determine the number of chromosomes from eukaryotes directly from nanopore sequencing reads as an orthogonal method to validate the assembly, which I term long-read karyocounting.

These studies provide complete genome assemblies for both novel bacterial species and a marine diatom who's genome structure had yet to be resolved. These approaches also demonstrate that there is more information encoded in long read sequencing data than just the sum of assembled sequence.

Summary for Lay Audience

The code for life is written in every living organism's DNA as a unique combination of 4 chemical letters. This combination, called the DNA sequence, determines what the living being is capable of. Technology to characterize DNA sequences has improved dramatically since 2014 with the invention of "nanopore" DNA sequencing, where DNA is pulled through a tiny pore for characterization. The main improvement is that the full size of a piece of DNA can be characterized. For my thesis, I improved our understanding of DNA sequences for bacteria and algae by developing new ways to analyze nanopore data, setting the foundation for future research with these organisms.

Nanopore sequencing is improving how complete a DNA sequence can be. For example, while the first human DNA sequence was published in 2001, it was not actually completed until 2021. This new technology comes with new analysis challenges. I developed a filtering and visualization method using the sequences to find analysis errors. I also showed that this same technique can be used to uncover alternate versions of the DNA sequence when more than one exists. Furthermore, I used these visuals to show that a recently discovered bacterium from the Canadian oil sands contained a region of DNA that can move itself from one bacteria to another. This region contained a DNA sequence that is known to pump toxic metals out of its cell, suggesting the bacterium may be capable of acquiring new DNA regions to survive.

A separate track of my thesis focused on better understanding an algae with significant commercial interest because it can be used to make low-cost proteins like the SARS-CoV-2 proteins, required for rapid COVID-19 testing kits. Although a DNA sequence for this algae was published in 2008, it was not complete. In this thesis, I created the first complete DNA sequence for this algae. I also developed a separate analysis method to determine how many chromosomes exist.

Overall, this thesis provides more complete DNA sequences for several new bacteria, and completes the DNA sequence for a commercially-valuable algae. The analysis methods I developed show that there is more information encoded in the DNA sequence than just the combination of the 4 different letters.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Included in

Genomics Commons