Electronic Thesis and Dissertation Repository


Doctor of Philosophy


Computer Science


Kaizhong Zhang


Glycosylation is a frequently observed post-translational modification (PTM) of proteins. It has been estimated over half of eukaryotic proteins in nature are glycoproteins. Glycoprotein analysis plays a vital role in drug preparation. Thus, characterization of glycans that are linked to proteins has become necessary in glycoproteomics. Mass spectrometry has become an effective analytical technique for glycoproteomics analysis because of its high throughput and sensitivity. The large amount of spectral data collected in a mass spectrometry experiment makes manual interpretation impossible and requires effective computational approaches for automated analysis. Different algorithmic solutions have been proposed to address the challenges in glycoproteomics analysis based on mass spectrometry. However, new algorithms that can identify intact glycopeptides are still demanded to improve result accuracy.

In this research, a glycan is represented as a rooted unordered labelled tree and we focus on developing effective algorithms to determine glycan structures from tandem mass spectra. Interpreting the tandem mass spectra of glycopeptides with a de novo sequencing method is essential to identifying novel glycan structures. Thus, we mathematically formulated the glycan de novo sequencing problem and propose a heuristic algorithm for glycan de novo sequencing from HCD tandem mass spectra of glycopeptides.

Characterizing glycans from MS/MS with a de novo sequencing method requires high-quality mass spectra for accurate results. The database search method usually has the ability to obtain more reliable results since it has the assistance of glycan structural information. Thus, we propose a de novo sequencing assisted database search method, GlycoNovoDB, for mass spectra interpretation.