Electronic Thesis and Dissertation Repository


Doctor of Philosophy


Statistics and Actuarial Sciences

Collaborative Specialization



Dean, Charmaine

2nd Supervisor

Kulperger, Reg

Joint Supervisor


Mutations are alterations of the DNA nucleotide sequence of the genome. Analyses of spatial properties of mutations are critical for understanding certain mutational mechanisms relevant to genetic disease, diversity, and evolution. The studies in this thesis focus on two types of mutations: point mutations, i.e., single nucleotide polymorphism (SNP) genotype differences, and mutations in segments, i.e., copy number variations (CNVs). The microarray platform, such as the Mouse Diversity Genotyping Array (MDGA), detects these mutations genome-wide with lower cost compared to whole genome sequencing, and thus is considered for suitability as a screening tool for large populations. Yet it provides observation of mutations with high degree of missingness across the genome due to its design, which thus leads to challenges for statistical analyses. Three topics are studied in this thesis: the development of formal statistical tools for detecting the existence of point mutation clusters under the microarray platform; the evaluation of the performance of test statistics developed while accounting for various probe designs, in terms of the capabilities of detecting mutation clusters; the development of formal statistical tools for testing the existence of spatial association between point mutations and mutations in segments. Statistical models such as Poisson point processes and Neyman-Scott processes are used for the distributions of the locations of point mutations under null and alternative hypotheses. Monte Carlo frameworks are established for statistical inference and the evaluation of power performance of the proposed test statistics. Tests with desirable performance are identified and recommended as screening tools. These statistical tools can be used for the study of other genomic events in the form of point events and events in segments, as well as with other microarray platforms than the MDGA which is utilized here. Simulated probe sets based on a window-based probe design mimicing the design of the MDGA are used to study the effect of various factors in probe design on the performance of test statistics. Insights are offered for determining key features in such design, such as probe intensity, when designing a new microarray platform, in order to achieve desired power for the purpose of mutation cluster detection.

Included in

Biostatistics Commons