
Towards more complete metagenomic analyses through circularized genomes and conjugative elements
Abstract
Advancements in sequencing technologies have revolutionized biological sciences and led to the emergence of a number of fields of research. One such field of research is metagenomics, which is the study of the genomic content of complex communities of bacteria. The goal of this thesis was to contribute computational methodology that can maximize the data generated in these studies and to apply these protocols human and environmental metagenomic samples.
Standard metagenomic analyses include a step for binning of assembled contigs, which has previously been shown to exclude mobile genetic elements, and I demonstrated that this phenomenon extends to all conjugative elements, which are a subset of mobile genetic elements. I proposed two separate methodologies that could detect contigs that are potential conjugative elements: a curated set of profile hidden Markov models that are very efficient to run, or annotation using the full UniRef90 database, a slower but more sensitive method.
I then applied this framework to a large population-based cohort and to a study examining the association of the maternal human gut microbiota and the development of spina bifida. Broadly, the composition and abundances of conjugative elements were discriminatory between the age and geographic cohorts. In the spina bifida cohort, there was an enrichment of Campylobacter hominis and a conjugative element belonging to Campylobacter hominis, which was excluded from the metagenomic bins.
Next, I characterized a novel species belonging to the recently discovered manganese-oxidizing genus Manganitrophus growing on oil refinery carbon filters. I successfully circularized the genomes of three strains and got quality assemblies for the remaining two samples. Furthermore, I identified a previously uncharacterized conjugative plasmid belonging to the species using my framework developed in chapter 2.
Finally, I developed an assembly pipeline to perform a secondary assembly on binned assemblies using long reads. The secondary assemblies yielded a number of additional circularized sequences that would be useful as scaffolds in future metatranscriptomic, variation analysis, and community dynamic studies.
The methodologies and applications in this thesis provide a framework for more complete metagenomic analyses going forward that will aid in our understanding of microbial ecology.