Electronic Thesis and Dissertation Repository


Master of Science


Computer Science


Lucian Ilie


The advent of the next generation sequencing technology (NGS) makes it possible to study metagenomics data which is directly extracted and cloned from assemblage of micro-organisms. Metagenomics data are diverse in species and abundance. Because most genome assemblers are designed for single genome assembly, they could not perform well on metagenomics data. To deal with the mixed and not uniformly distributed metagenomics reads, we developed a novel metagenomic assembler named MetaSAGE, on the platform of the existing SAGE assembler. MetaSAGE finds contigs from the overlap graph based on the minimum cost flow theory and uses mate-pair information to extract scaffolds from the overlap graph. When facing chimeric nodes, the MetaSAGE splits them separately according to the coverage of edges. MetaSAGE exhibits good performance compared to existing metagenomic assemblers.