Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article


Master of Science


Computer Science


Kaizhong Zhang


De novo peptide sequencing from tandem MS data is a key technology in proteomics for understanding the structure of proteins, especially for first seen sequences. Although this technique has advanced rapidly in recent years and become more effective, one crucial problem remained unsolved. Due to the isomerism of leucine and isoleucine, they are practically indistinguishable in de novo sequencing using traditional tandem MS data. Some experimental attempts have been made to resolve this ambiguity such as EThCD fragmentation process. In this study, we took a data focused approach rather than only looking for characteristic satellite ions produced by the EThCD fragmentation. We utilized cutting edge deep neural networks to digest raw spectra data in a broader range searching for other unknown evidence in the spectra in hopes to increase the reliability discriminating two isometric amino acids, while also explored the capabilities of such tools when dealing with tandem MS spectra data.

Summary for Lay Audience

This research aims to provide a better solution to the problem of identifying two hard to distinguish component in the protein sequence. Leucine and isoleucine have same molecular mass so using traditional sequencing method that try to analyze the composition of a protein sequence using mass spectrum data to tell these two amino acids apart would be impossible. Recent advances in mass spectrometry have opened up new possibilities to create further evidences in mass spectrum data that would be helpful to distinguish leucine and isoleucine and those evidences have proven to be quite effective. In this research we carried out a series of experiment feeding previously mentioned novel spectrum data into cutting edge deep neural networks to further explore its capability and search for new evidence that may help us have a better result distinguishing leucine and isoleucine.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.