Thesis Format
Monograph
Degree
Master of Science
Program
Computer Science
Collaborative Specialization
Biostatistics
Supervisor
Kaizhong Zhang
Abstract
In recent years, the technology of glycopeptide sequencing through MS/MS mass spectrometry data has achieved remarkable progress. Various software tools have been developed and widely used for protein identification. Estimation of false discovery rate (FDR) has become an essential method for evaluating the performance of glycopeptide scoring algorithms. The target-decoy strategy, which involves constructing decoy databases, is currently the most popular utilized method for FDR calculation. In this study, we applied various decoy construction algorithms to generate decoy glycan databases and proposed a novel approach to calculate the FDR by using the EM algorithm and mixture model.
Summary for Lay Audience
In recent years, an increasing number of glycopeptide identification software has been developed, capable of scoring glycopeptides and identifying tandem mass spectrometry data. However, due to the potential mistakes in the results, false discovery rate (FDR) estimation plays a key role in evaluating the confidence of correctness. Applying the decoy-target approach is one of the most effective methods for calculating FDR, which requires building a decoy database. In this study, we explored a novel method for generating decoy databases based on the probability of glycan composition in the target database, and then compared it with other decoy construction methods. Meanwhile, since the distribution of target matches could be a mixture of the correct matches and incorrect matches, we created a new FDR estimation approach by using the EM algorithm with a mixture model.
Recommended Citation
Li, Xiaoou, "Decoy-Target Database Strategy and False Discovery Rate Analysis for Glycan Identification" (2023). Electronic Thesis and Dissertation Repository. 9581.
https://ir.lib.uwo.ca/etd/9581