Conditional Dependence Learning of Noisy Data under Graphical Models

Abstract

Graphical models are useful tools for characterizing the conditional dependence among variables with complex structures. While many methods have been developed under graphical models, their validity is vulnerable to the quality of data. A fundamental assumption associated with most available methods is that the variables need to be precisely measured. This assumption is, however, commonly violated in reality. In addition, the frequent occurrence of missingness in data exacerbates the difficulties of estimation within the context of graphical models. Ignoring either mismeasurement or missingness effects in estimation procedures can yield biased results, and it is imperative to accommodate these effects when conducting inferences under graphical models. In this thesis, we address challenges arising from noisy data with measurement error or missing observations within the framework of graphical models for conditional dependence learning.

The first project addresses mixed graphical models applied to data involving mismeasurement in discrete and continuous variables. We propose a mixed latent Gaussian copula graphical measurement error model to describe error-contaminated data with mixed continuous and discrete variables. To estimate the model parameters, we develop a simulation-based expectation-maximization method that incorporates the measurement error effects. Furthermore, we devise a computationally efficient procedure to implement the proposed method. The asymptotic properties of the proposed estimator are established, and the finite sample performance of the proposed method is evaluated by numerical studies.

In contrast to analyzing error-prone variables in the first project, we further examine variables that are susceptible to not only mismeasurement but also missingness. In the second project, we examine noisy data that are subject to both error-contamination and incompleteness, in which we focus on the Ising model designed for learning the conditional dependence structure among binary variables. We extend the conventional Ising model using additional layers of modeling to describe data with both misclassification and missingness. To estimate the model parameters with the misclassification and missingness effects accommodated simultaneously, we develop a new inferential procedure by utilizing the strength of the insertion correction strategy and the inverse probability weighted method. To facilitate the sparsity of the graphical model, we further employ the regularization technique, and accommodate for a class of penalty functions, including widely-used penalty functions such as SCAD, MCP, and HT penalties. We rigorously establish the asymptotic properties of the proposed estimators, with associated regularity conditions identified.

The third project extends the second one by moving from binary varaibles to accommodating mixed variables, where, in addition to the target study dataset, auxiliary datasets from related studies are available, yet all data are subjected to missingness. From the measurement error perspective, the target and auxiliary datasets can be regarded as accurate and error-contaminated measurements for the variables of interest, respectively. To describe the conditional dependence relationships among variables, we explore mixed graphical models characterized by the exponential family distributions. Moreover, by leveraging a transfer learning strategy, we propose an inferential procedure that accounts for missingness effects to enhance estimation of the model parameters relevant to the target study using information from auxiliary datasets. We evaluate the finite sample performance of the proposed methods through numerical studies.

This thesis contributes new methodologies to address challenges arising from noisy data with mismeasurement or missing values. The proposed methods broaden the application of graphical models for learning complex conditional dependencies among variables of various natures.

This item has been relocated to Western University’s Open Repository

Conditional Dependence Learning of Noisy Data under Graphical Models

Abstract

Links

Browse

Author Corner

Links