Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Pathology and Laboratory Medicine


Poon, Art F.Y.


The persistent latent reservoir of long-lived cells carrying integrated HIV DNA is the source of reinfection upon treatment interruption, and a primary focus for cure research. The reservoir is difficult to study because these cells are relatively rare or located in tissues that are difficult to sample. Sequencing proviral DNA in the latent reservoir is an important source of information about reservoir establishment and persistence, especially from the presence of identical (clonal) sequences. I evaluated the relationship between select measures of these clonal sequences and drivers of reservoir persistence, e.g., clonal expansion, by implementing a simulation model of within-host HIV dynamics in actively and latently infected cells. I implemented a discrete event simulation in the R package treeswithintrees, with four populations of cells corresponding to active, latent, replenishment and death compartments. To simulate molecular evolution on the resulting trees, I collapsed branches representing infected cells in a latent state and ran the program INDELible with parameters calibrated to HIV-1 on a representative env sequence. I propose a new clonality statistic (pairwise clonality) that can capture the genetic diversity of a sample with less information loss. I then evaluated the response of two clonality statistics used in literature (the proportion of identical sequences, Gini coefficient) and my proposed clonality statistic (number of identical pairwise comparisons) to changes in simulation parameter values by fitting a General linear Model (GLM). I found that the former clonality statistics were not as robust as the proposed pairwise clonality score. In addition, there were significant associations between clonality statistics and simulation parameters. Finally, I implemented a particle filtering method to evaluate non-linear relationships between simulation parameters and the clonality scores.

Summary for Lay Audience

Human Immunodeficiency Virus (HIV) is a disease with significant social and economic burden without a cure – it is currently manageable through combination Antiretroviral Therapy (cART). However cART is unable to remove viruses that have inserted themselves into the DNA of host cells (the latent reservoir). The latent reservoir is relatively stable and long lasting and is a source of re-infection if cART is halted, or otherwise made ineffective. The latent reservoir is difficult to study in humans for various reasons. Therefore, simulations can be helpful to study the evolution & dynamics of HIV within a host. I created a simulation framework "simclone" that can simulate within-host evolution of HIV. Events of evolutionary significance during the course of an HIV infection were simulated: acute infection, cART initiation and chronic infection. These simulations are then used to simulate evolution on a HIV-1 sequence. Clonality statistics (i.e. how many sequences in a sample are identical) are used to understand the latent reservoir, but the commonly used statistics (proportional clonality, GINI coefficient) have significant drawbacks. I propose the "pairwise clonality" statistic as a response to this issue. I then analyze the simulation model to see if any parameter values (e.g. infection rate) are associated with increased clonality scores for each of the metrics. I then implement a particle filtering method to attempt simulation parameter estimation based on clonality score. The results shown in this work i) provide a framework for simulating within-host evolution that enables specific hypotheses testing ii) proposes a clonality statistic that has better statistical properties than existing measures iii) highlights the difficulties in using particle filtering for parameter estimation.