
Discrimination of Leucine and Isoleucine in De Novo peptide sequencing using deep neural networks
Abstract
De novo peptide sequencing from tandem MS data is a key technology in proteomics for understanding the structure of proteins, especially for first seen sequences. Although this technique has advanced rapidly in recent years and become more effective, one crucial problem remained unsolved. Due to the isomerism of leucine and isoleucine, they are practically indistinguishable in de novo sequencing using traditional tandem MS data. Some experimental attempts have been made to resolve this ambiguity such as EThCD fragmentation process. In this study, we took a data focused approach rather than only looking for characteristic satellite ions produced by the EThCD fragmentation. We utilized cutting edge deep neural networks to digest raw spectra data in a broader range searching for other unknown evidence in the spectra in hopes to increase the reliability discriminating two isometric amino acids, while also explored the capabilities of such tools when dealing with tandem MS spectra data.