Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Pathology and Laboratory Medicine


Poon, Art


Viral infection requires the interaction between virus surface-exposed (SE) proteins and host cell receptors. This can result in an “arms race” that is assumed to drive accelerated rates of evolution, and some well known examples of diversifying selection involve surface pro- teins (HIV-1 env, influenza hemagglutinin). We conducted a systematic analysis to determine whether this is truly a distinctive feature of SE virus proteins, in comparison to non-SE proteins encoded by the same genomes.

We obtained reference and all neighbour genomes of 52 human viruses from the NCBI Viral Genomes database. The coding sequences (CDS) of each genome extracted by pairwise alignment against the reference CDSs, and labeled as SE or non-SE using the Gene Ontology database and the transmembrane predictor TMbed. After generating a codon-aware multiple sequence alignments, we used FUBAR to estimate the joint probability distribution over 20 non-synonymous and synonymous rates for each alignment (the evolutionary fingerprint). We calculated the cosine distance between every pair of fingerprints and visualized the results using PCA.

In total, we analyzed 670 sets of homologous genes (125 of which were SE) from 21 virus families. We found no clear separation of SE from non-SE labels by PCA. Additionally, there were no significant differences between SE and non-SE genes in the codon site-specific mean dN/dS ratios, dN−dS differences, dN or dS independently, or the percentage of positive and/or negatively selected sites (Wilcoxon rank sum test, p < 0.05).

In closing, we did not find evidence that human virus genes encoding surface-exposed virus proteins undergo higher rates of adaptation than other protein-coding regions in the viral genome.

Summary for Lay Audience

Rapid evolution of viruses makes controlling virus infections extremely challenging. There- fore, it is important to further our understanding of how viruses evolve, and how this is shaped by the interaction between a virus and its host. For instance, the host immune system often tar- gets the surface exposed (SE) components of a virus. As a result, there is an ongoing host-virus arms race where the SE components of virus proteins are constantly changing to escape detec- tion by the host immune system, which in turn is constantly adapting to recognise and bind the virus proteins. A recent systematic study by Wang et al. (2020) found evidence of elevated rates of evolution in the host receptor proteins used by viruses, and other proteins expressed on the surface of host cells that directly interact with viruses, in primate genomes.

In this study, we examined whether the SE proteins of viruses are also under elevated rates of evolution. My aim is to conduct a systematic analysis of the evolutionary patterns in human virus genomes, by measuring the selective pressures acting on protein coding genes, and comparing these estimates between SE and non-SE virus proteins. I hypothesize that the genes encoding SE virus proteins are under higher rates of evolution than other protein-coding regions in the viral genome.

After examining genome sequences of 52 human viruses from the National Center for Biotechnology Information (NCBI) with a total of 670 genes (125 SE) belonging to 21 different virus families, we found no difference in the evolution rate of SE and non-SE proteins.