Electronic Thesis and Dissertation Repository

Degree

Master of Science

Program

Computer Science

Supervisor

Lucian Ilie

Abstract

The advent of the next generation sequencing technology (NGS) makes it possible to study metagenomics data which is directly extracted and cloned from assemblage of micro-organisms. Metagenomics data are diverse in species and abundance. Because most genome assemblers are designed for single genome assembly, they could not perform well on metagenomics data. To deal with the mixed and not uniformly distributed metagenomics reads, we developed a novel metagenomic assembler named MetaSAGE, on the platform of the existing SAGE assembler. MetaSAGE finds contigs from the overlap graph based on the minimum cost flow theory and uses mate-pair information to extract scaffolds from the overlap graph. When facing chimeric nodes, the MetaSAGE splits them separately according to the coverage of edges. MetaSAGE exhibits good performance compared to existing metagenomic assemblers.

Share

COinS