Thesis Format
Integrated Article
Degree
Doctor of Philosophy
Program
Statistics and Actuarial Sciences
Supervisor
Camila P.E. de Souza
2nd Supervisor
Wenqing He
Co-Supervisor
3rd Supervisor
Felipe F. Rodrigues
Affiliation
King's University College at Western University
Co-Supervisor
Abstract
Variational Bayesian inference is a method to approximate the posterior distribution under a Bayesian model analytically. As an alternative to Markov Chain Monte Carlo (MCMC) methods, variational inference (VI) produces an analytical solution to an approximation of the posterior but have a lower computational cost compared to MCMC methods. The main challenge of applying VI comes from deriving the equations used to update the approximated posterior parameters iteratively, especially when dealing with complex data. In this thesis, we apply the VI to the context of functional data clustering and survival data analysis. The main objective is to develop novel VI algorithms and investigate their performance under these complex statistical models.
In functional data analysis, clustering aims to identify underlying groups of curves without prior group membership information. The first project in this thesis presents a novel variational Bayes (VB) algorithm for simultaneous clustering and smoothing of functional data using a B-spline regression mixture model with random intercepts. The deviance information criterion is employed to select the optimal number of clusters.
The second project shifts focus to survival data analysis, proposing a novel mean-field VB algorithm to infer parameters of the log-logistic accelerated failure time (AFT) model. To address intractable calculations, we propose and incorporate a piecewise approximation technique into the VB algorithm, achieving Bayesian conjugacy.
The third project is motivated by invasive mechanical ventilation data from intensive care units (ICUs) in Ontario, Canada, which form multiple clusters. We assume that patients within the same ICU cluster are correlated. Extending the second project's methodology, a shared frailty log-logistic AFT model is introduced to account for intra-cluster correlation through a cluster-specific random intercept. A novel and fast VB algorithm for model parameter inference is presented.
Extensive simulation studies assess the performance of the proposed VB algorithms, comparing them with other methods, including MCMC algorithms. Applications to real data, such as ICU ventilation data from Ontario, illustrate the methodologies' practical use. The proposed VB algorithms demonstrate excellent performance in clustering functional data and analyzing survival data, while significantly reducing computational cost compared to MCMC methods.
Summary for Lay Audience
Variational inference (VI) is a statistical technique used to estimate model parameters within a Bayesian framework. It provides similar accuracy to the traditional Markov Chain Monte Carlo (MCMC) method but does so more efficiently. My research explores applying VI in two key areas: clustering time-varying data (like daily temperature changes) and analyzing time-to-event data (such as hospital stay durations).
The first study introduces a new approach to group time-varying data into meaningful clusters, helping us identify underlying patterns without prior knowledge of these groups. For instance, this method could reveal geographical areas with similar weather patterns based on daily temperature data. We also use an information criterion to determine the optimal number of clusters.
The second study focuses on time-to-event (or survival time) data. We propose a new method to analyze the impact of risk factors on the time-to-event and predict survival times. This is particularly valuable in medical research, where understanding the lifespan of patients with specific conditions is crucial.
The third study is motivated by data on ventilation duration in intensive care units (ICUs) in Ontario, Canada. ICU patients often share similar environments, and our method accounts for these similarities to provide more accurate predictions of survival time (i.e., ventilation duration).
Each project includes extensive simulation studies that demonstrate the effectiveness of our proposed methods. We also apply these methodologies to various real-world datasets. Overall, my research aims to make advanced data analysis tools more accessible and efficient, ultimately supporting better decision-making in fields like healthcare.
Recommended Citation
Xian, Chengqian, "Variational Bayesian inference for functional data clustering and survival data analysis" (2024). Electronic Thesis and Dissertation Repository. 10424.
https://ir.lib.uwo.ca/etd/10424
Included in
Biostatistics Commons, Statistical Methodology Commons, Statistical Models Commons, Survival Analysis Commons