Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Science

Program

Epidemiology and Biostatistics

Collaborative Specialization

Biostatistics

Supervisor

Choi, Yun-Hee

2nd Supervisor

Espin-Garcia, Osvaldo

Co-Supervisor

Abstract

Multiple imputation (MI) is a widely adopted approach for handling missing data and has proven to be a robust tool, particularly when dealing with large sets of missing covariates. When data are missing at random, multiple imputation generally outperforms complete case analysis. However, in the analysis of clustered survival data arising from family-based studies with missing covariates, current MI methods do not handle the familial structure of the data, as well as the ascertainment of families. Our study proposes to integrate the kinship matrix into the multiple imputation process by calculating the conditional means and variances of the individual’s missing data given the observations of other family members, thereby explicitly incorporating family structure information. We compare the performance of our proposed methods, commonly used multiple imputation methods that do not consider the kinship matrix, and complete case analysis. Our findings indicate that failing to account for familial correlation when imputing genetically associated variables results in slightly higher biases, and liberal variance estimations. The proposed MI method is applied to the breast cancer families recruited from the Breast Cancer Family Registries to evaluate the effects of the polygenic risk score (PRS) and mutation status where the PRS is subject to missing.

Summary for Lay Audience

Data analysis may face challenges when some values are missing. Multiple imputation (MI) is a robust tool for handling missing data by imputing the missing values several times, and average them to obtain final estimates. In the family-based study, some vari- ables may be genetically correlated between individuals within one family because of the inheritance. In our research, we propose to incorporate the familial correlations into the MI algorithm, when variables subject to missing are genetically correlated. We proposed the MI algorithms by integrating the kinship matrix, when imputing the missing data in a family-based study. Simulations and data analysis of BRCA1/2 families demon- strated that our method provides greater consistency and accuracy compared to existing approaches under most of cases.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Available for download on Friday, December 18, 2026

Share

COinS