Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article

Degree

Master of Science

Program

Computer Science

Supervisor

Ilie, Lucian

Abstract

The accurate prediction of protein-protein interaction (PPI) binding sites is a fundamental problem in bioinformatics, since most of the time proteins perform their functions by interacting with some other proteins. Experimental methods are slow, expensive and not very accurate, hence the need for efficient computational methods.

In this thesis, we perform a study aiming to improve the performance of the currently best program for binding site prediction, DELPHI. We have employed some of the currently best techniques from machine learning, including attention and various embedding techniques, such as BERT and ELMo. This is the first time such tools are being tested for this problem. We have tested many architectures on a large dataset and analyzed our findings. While we succeeded to improve the performance, it is interesting to notice that some of the best machine learning techniques failed to provide the expected improvement, a fact that will require further investigation.

Summary for Lay Audience

Proteins are an essential component to any organism which takes part in every process within living cells. The cell functioning often happens due to the interaction of proteins with other proteins. This mechanism is called protein-protein interaction (PPI) in biology. In this event, predicting those amino acids which helps in binding, also referred as PPI binding sites becomes a fundamental problem in computational biology.

We present, a thorough study which aims to improve the performance of DELPHI, the current best program for predicting binding sites. In order to achieve this, we have implemented some of the recent best techniques from machine learning. Many architectures on a large dataset have been tested to analyze our findings. It is very interesting to note that even if we succeeded to improve the performance, some of the best machine learning techniques failed to provide expected improvement, which definitely leads us towards further investigation.

Share

COinS