Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Science

Program

Computer Science

Supervisor

Ilie, Lucian

Abstract

Visualization and interpretation of deep learning models' prediction is a very important area of research in machine learning nowadays. Researchers are not only focused on generating a model with good performance, but also they want to trust the model. Our aim in this thesis is to adapt existing interpretation methods to a protein-protein binding site prediction problem to visualize and understand the model's prediction and learning pattern.

We present three deep learning-based interpretation methods: sensitivity analysis, saliency map and integrated gradients to analyze the amino acid residues which create positive and negative relevance to the deep learning models' prediction. As our applications use sliding window protocol, we are particularly interested in learning patterns of the model and identify the important positions. Also, we want to focus on the feature importance through Local Interpretable Model-Agnostic Explanations (LIME).

With our experiment we observe that, in spite of using various features, position specific scoring matrices (PSSM) is identified as the most important feature that helps the model to identify positive classes. As PSSM's importance is proven historically, this finding gives us a strong biological significance as well.

Summary for Lay Audience

Deep learning models are often called as black box in spite of their tremendous success with respect to accuracy. As the accuracy of the model increases from a decision tree to neural networks, the complexity of the model also increases. And higher the complexity the lesser is the degree of explainability.

When we use deep learning models in sensitive areas like bioinformatics and drug production it becomes essential to understand their performance well. This motivates us to dig into this area of visualization and interpretation for proteins. We have adapted few existing interpretation methods to our PPI problem to understand how the model is learning while it predicts. Also, we focus on the important features which play key role in model's decision. In this dissertation, we look into the background study and related works, then we discuss the methods we have used and finally we present our findings from the visualizations.

Share

COinS