
De novo sequencing of multiple tandem mass spectra of peptide containing SILAC labeling
Abstract
The systematic studies of proteins has gradually become fundamental in the research related to molecular biology. Shotgun proteomics use bottom-up proteomics techniques in identifying proteins contained in complex mixtures using a combination of high performance liquid chromatography coupled with mass spectrometry technology. Current mass spectrometers equipped with high sensitivity and accuracy can produce thousands of tandem mass spectrometry (MS/MS) spectra in a single run. The large amount of data collected in a single LC-MS/MS run requires effective computational approaches to automate the process of spectra interpretation. De novo peptide sequencing from tandem mass spectrometry (MS/MS) has emerged as an important technology for peptide sequencing in proteomics. However, the low identification rate of the acquired mass spectral limits the efficiency of computational approaches. To increase the accuracy and practicality of de novo sequencing, some previous algorithms used multiple spectra to identify the peptide sequence.
In this thesis, we focus on de novo sequencing of multiple SILAC labeled tandem mass spectra. Compared with previous approach, our research develop de novo sequencing algorithms based on different idea of how to use multiple spectra. SILAC technology uses medium containing different kinds of isotope-labeled essential amino acids, usually Arginine(R) and Lysine(K), to label newly synthesized proteins with stable isotopes during cell growth. Multiple MS/MS spectra for the same peptide sequence are produced by spectrometer after the SILAC samples are processed by LC-MS/MS shotgun proteomics. Based on the factors such as the type of isotope labeling, retention time, precursor ion mass, multiple spectra with different type of SILAC modifications for the same peptide in the sample can be used to identify the peptide sequence. In this study, not only are we aiming to identify the peptide sequence with specific SILAC modifications, but we are also pinpointing locations of SILAC modifications from multiple SILAC labeled MS/MS spectra. We propose two de novo sequencing algorithms to compute the peptide sequence which are based on total number of SILAC modifications and based on the combinations of SILAC modifications of Arginine(R) and Lysine(K). With two dynamic programming algorithms to identify peptide sequence and locating its SILAC modifications, the potential candidates are computed with similarity scores and then refinement algorithms are applied. Finally, a confident score is designed to measure all of the candidate sequence.
To verify the performance of our algorithm, we compare the experimental results. We also compare the output candidates between our approach and PEAKS de novo.