Thesis Format
Monograph
Degree
Doctor of Philosophy
Program
Statistics and Actuarial Sciences
Supervisor
Yi, Grace Y.
Abstract
Graphical models are useful tools for characterizing the conditional dependence among variables with complex structures. While many methods have been developed under graphical models, their validity is vulnerable to the quality of data. A fundamental assumption associated with most available methods is that the variables need to be precisely measured. This assumption is, however, commonly violated in reality. In addition, the frequent occurrence of missingness in data exacerbates the difficulties of estimation within the context of graphical models. Ignoring either mismeasurement or missingness effects in estimation procedures can yield biased results, and it is imperative to accommodate these effects when conducting inferences under graphical models. In this thesis, we address challenges arising from noisy data with measurement error or missing observations within the framework of graphical models for conditional dependence learning.
The first project addresses mixed graphical models applied to data involving mismeasurement in discrete and continuous variables. We propose a mixed latent Gaussian copula graphical measurement error model to describe error-contaminated data with mixed continuous and discrete variables. To estimate the model parameters, we develop a simulation-based expectation-maximization method that incorporates the measurement error effects. Furthermore, we devise a computationally efficient procedure to implement the proposed method. The asymptotic properties of the proposed estimator are established, and the finite sample performance of the proposed method is evaluated by numerical studies.
In contrast to analyzing error-prone variables in the first project, we further examine variables that are susceptible to not only mismeasurement but also missingness. In the second project, we examine noisy data that are subject to both error-contamination and incompleteness, in which we focus on the Ising model designed for learning the conditional dependence structure among binary variables. We extend the conventional Ising model using additional layers of modeling to describe data with both misclassification and missingness. To estimate the model parameters with the misclassification and missingness effects accommodated simultaneously, we develop a new inferential procedure by utilizing the strength of the insertion correction strategy and the inverse probability weighted method. To facilitate the sparsity of the graphical model, we further employ the regularization technique, and accommodate for a class of penalty functions, including widely-used penalty functions such as SCAD, MCP, and HT penalties. We rigorously establish the asymptotic properties of the proposed estimators, with associated regularity conditions identified.
The third project extends the second one by moving from binary varaibles to accommodating mixed variables, where, in addition to the target study dataset, auxiliary datasets from related studies are available, yet all data are subjected to missingness. From the measurement error perspective, the target and auxiliary datasets can be regarded as accurate and error-contaminated measurements for the variables of interest, respectively. To describe the conditional dependence relationships among variables, we explore mixed graphical models characterized by the exponential family distributions. Moreover, by leveraging a transfer learning strategy, we propose an inferential procedure that accounts for missingness effects to enhance estimation of the model parameters relevant to the target study using information from auxiliary datasets. We evaluate the finite sample performance of the proposed methods through numerical studies.
This thesis contributes new methodologies to address challenges arising from noisy data with mismeasurement or missing values. The proposed methods broaden the application of graphical models for learning complex conditional dependencies among variables of various natures.
Summary for Lay Audience
Graphical models are valuable tools for characterizing conditional dependence among variables with complex structures. While many methods have been developed for such models, their validity is often compromised by data quality. A key assumption is that variables must be precisely measured, which is frequently violated in practice. In addition, missing values further complicate inferential procedures. Ignoring measurement errors or missing data can lead to erroneous results, and it is crucial to address these effects for valid inference.
This thesis addresses challenges arising from noisy data within the framework of graphical models for conditional dependence learning. We propose new models and methodologies that accommodate mismeasurement and/or missingness in data. For mixed continuous and discrete variables, we develop a mixed latent Gaussian copula graphical measurement error model and a simulation-based expectation-maximization method to estimate model parameters, incorporating measurement error effects. Additionally, we extend the Ising model to account for misclassification and missing data among binary variables, and we develop a new inferential procedure for parameter estimation.
Furthermore, we explore mixed graphical models characterized by exponential family dis-tributions and leverage transfer learning techniques to enhance parameter estimation using auxiliary datasets from related studies alongside the target dataset. This approach enables us to address both mismeasurement and missingness, enhancing the applicability of graphical models for conditional dependence learning.
In summary, this thesis develops new methodologies to address challenges arising from noisy data within graphical models and broadens their application to handle data with complex structures.
Recommended Citation
Shi, Yu, "Conditional Dependence Learning of Noisy Data under Graphical Models" (2024). Electronic Thesis and Dissertation Repository. 10426.
https://ir.lib.uwo.ca/etd/10426
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.