Date of Award

2009

Degree Type

Thesis

Degree Name

Master of Science

Program

Applied Mathematics

Supervisor

Dr. Lindi WahlMutual information (MI) is a measure frequently used to find co-evolving sites in protein families. However, factors unrelated to protein structure and function, in particular sampling variance in amino acid counts and complex evolutionary relationships among sequences, contribute to ML Understanding the contribution of these components is essential for isolating the MI associated with structural or functional co-evolution. To date, the contributions of these factors to mutual information have not been fully elucidated. We find that stochastic variations in amino acid counts and shared phylogeny each contribute substantially to measured MI. Nonetheless, the mutual information observed in real-world protein families is consistently higher than the expected contribution of these two factors. In contrast, when using synthetic data with realistic substitution rates and phylogenies, but without structural or functional constraints, the observed levels of MI match those expected due to stochastic and phylogenetic background. Our results suggest that either low levels of co-evolution are ubiquitous across positions in protein families, or some unknown factor exists beyond the currently hypothesized components of intra-protein mutual information: sampling variance, phylogenetics and structural/functional co-evolution.

Abstract

Mutual information (MI) is a measure frequently used to find co-evolving sites in protein families. However, factors unrelated to protein structure and function, in particular sampling variance in amino acid counts and complex evolutionary relationships among sequences, contribute to ML Understanding the contribution of these components is essential for isolating the MI associated with structural or functional co-evolution. To date, the contributions of these factors to mutual information have not been fully elucidated.

We find that stochastic variations in amino acid counts and shared phylogeny each contribute substantially to measured MI. Nonetheless, the mutual information observed in real-world protein families is consistently higher than the expected contribution of these two factors. In contrast, when using synthetic data with realistic substitution rates and phylogenies, but without structural or functional constraints, the observed levels of MI match those expected due to stochastic and phylogenetic background.

Our results suggest that either low levels of co-evolution are ubiquitous across positions in protein families, or some unknown factor exists beyond the currently hypothesized components of intra-protein mutual information: sampling variance, phylogenetics and structural/functional co-evolution.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.