Electronic Thesis and Dissertation Repository

Visualization and Interpretation of Protein Interactions

Dipanjan Chatterjee, The University of Western Ontario

Abstract

Visualization and interpretation of deep learning models' prediction is a very important area of research in machine learning nowadays. Researchers are not only focused on generating a model with good performance, but also they want to trust the model. Our aim in this thesis is to adapt existing interpretation methods to a protein-protein binding site prediction problem to visualize and understand the model's prediction and learning pattern.

We present three deep learning-based interpretation methods: sensitivity analysis, saliency map and integrated gradients to analyze the amino acid residues which create positive and negative relevance to the deep learning models' prediction. As our applications use sliding window protocol, we are particularly interested in learning patterns of the model and identify the important positions. Also, we want to focus on the feature importance through Local Interpretable Model-Agnostic Explanations (LIME).

With our experiment we observe that, in spite of using various features, position specific scoring matrices (PSSM) is identified as the most important feature that helps the model to identify positive classes. As PSSM's importance is proven historically, this finding gives us a strong biological significance as well.