
Applications of nanopore DNA sequencing for improved genome assembly
Abstract
An organism's genome is the ultimate determinant of its functional potential. Understanding genomes is therefore essential to understand function, and a foundational knowledge of a genome is required transfer functions to and from microorganisms of interest. Sequencing DNA using nanopores is a recent advance that resolves limitations of previous technologies, enabling an improved understanding of genomes. For this thesis, I improved our understanding of microbial genomes by developing novel approaches to analyze long read sequencing data, setting the foundation for future synthetic biology work.
Long sequencing reads have enabled routine assembly of complete bacterial genomes by directly sequencing DNA extracted from bacterial communities. I showed that visualizing sequencing coverage after filtering read alignments using a 95\% query coverage cutoff (i.e., the entire read aligns to the genome) enabled the detection of mis-assemblies. I also showed it can be applied to detect recoverable alternate haplotypes containing important functional elements. Furthermore, I used this approach to demonstrate that a circular genome for a novel species of Saccharibacteria, enriched from a heavy-metal polluted Northern Albertan tailings pond, contains a recently acquired genomic island. I also determined this genomic island encodes heavy metal-resistance genes, suggesting that horizontal gene transfer may be possible under selective pressure in Saccharibacteria.
Another track of my thesis focused on applying nanopore sequencing on a marine diatom, Phaeodactylum tricornutum, which has significant interest for synthetic biology applications like producing low-cost glycosylated proteins. This species does not have a complete genome assembly, despite a draft sequence being available since 2008. To determine the full structure of the genome, I used ultra-long sequencing reads to build a telomere-to-telomere genome assembly. I also developed a novel, assembly-free approach to determine the number of chromosomes from eukaryotes directly from nanopore sequencing reads as an orthogonal method to validate the assembly, which I term long-read karyocounting.
These studies provide complete genome assemblies for both novel bacterial species and a marine diatom who's genome structure had yet to be resolved. These approaches also demonstrate that there is more information encoded in long read sequencing data than just the sum of assembled sequence.