Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Computer Science


Tandem mass spectrometry (MS/MS) is the key technology for glycopeptide identification in high-throughput large-scale glycoproteomics. Estimation of false discovery rates (FDR) is essential for evaluating the quality of the MS/MS-based identification software tools. Although numerous glycopeptide identification tools have been recently proposed, there have been few widely accepted approaches for glycopeptide FDR analysis due to the great structural diversity of glycans. The target-decoy search strategy is currently the most common method for FDR estimation of peptide-spectral matches. In this study, we constructed decoy glycan databases by various methods and compared the FDR from the database search scores produced by each decoy glycan database. Furthermore, we employed a mixture model that facilitates distinguishing between correct and incorrect identifications among the database search score distribution for a better comparison of different decoy glycan database constructions.

Summary for Lay Audience

Tandem mass spectrometry (MS/MS) is an essential tool to identify chemical substances. Since various glycopeptide identification software have been developed for the past decades, a large quantity of MS/MS data can be identified in a single run of this software. In large-scale glycoproteomics, false discovery rate (FDR) estimation plays a vital role to evaluate the identification results produced by the software because the results may contain incorrect assignments, and manually checking them is not feasible for large datasets. Although extensive research has been carried out on FDR estimation in proteomics, there have been few widely accepted approaches to FDR analysis for glycan because of their structural diversity. Target-decoy search strategy is the standard method to estimate FDR in proteomics, where the sequencing software searches the real target database and incorrect decoy database. In this study, we generated different kinds of decoy glycan databases and compared the effectiveness of the databases for reasonable FDR estimation of glycopeptide identification. To compare the decoy glycan database, we used a mixture model for the differentiation of correct and incorrect glycopeptide assignments.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.