Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Computer Science

Collaborative Specialization

Artificial Intelligence


Zhang Kaizhong


The human leukocyte antigen (HLA) system or complex plays an essential role in regulating the immune system in humans. Accurate prediction of peptide binding with HLA can efficiently help to identify those neoantigens, which potentially make a big difference in immune drug development. HLA is one of the most polymorphic genetic systems in humans, and thousands of HLA allelic versions exist. Due to the high polymorphism of HLA complex, it is still pretty difficult to accurately predict the binding affinity. In this thesis, we presented a new algorithm to combine convolutional neural network and long short-term memory to solve this problem. Compared with other current popular algorithms, our model achieved the state-of-the-art results.

Summary for Lay Audience

In recent years, deep learning has witnessed many significant breakthroughs in different areas and achieves encouraging results. Various machine learning methods have also been applied in bioinformatics fields. Inside those methods, due to the importance of major histocompatibility complex (MHC) in immunity, many models have been developed for MHCpeptide binding prediction. However, due to the unique properties of MHC, it is still difficult to accurately predict the MHC-peptide binding specification for all the MHC alleles.

In this thesis, we proposed a novel model which combined convolutional neural network and long short-term memory to solve this problem. Our model has been tested with the experimental benchmark from IEDB and shows powerful performance compared with other currently popular algorithms.