
Thesis Format
Integrated Article
Degree
Doctor of Philosophy
Program
Statistics and Actuarial Sciences
Supervisor
He, Wenqing
2nd Supervisor
Kumbhare, Dinesh
Affiliation
University of Toronto
Co-Supervisor
Abstract
This thesis focuses on developing advanced clustering methods and analyzing data arised from chronic pain (CP) studies, with a particular emphasis on the unique challenges posed by self-reported (SR) data. Latent class analysis (LCA) is explored in the early stages of this work to cluster patients, and the clusters are compared to find features that are significantly different among clusters. While LCA is effective for categorical variables, it fails to address the mixed data types and subjective biases inherent in SR data. To overcome these limitations, we propose a novel distance metric tailored specifically for SR questionnaire data. This distance incorporates the correlation distance with other elementary distances for clustering data of mixed type, which outperforms existing metrics in handling mixed data when SR variables are present. Additionally, interpretable clustering techniques are utilized to generate simple, actionable rules that can be applied in clinical practice. To integrate the domain knowledge of CP experts into the clustering process, a semi- supervised clustering algorithm is introduced, allowing the distance metric to be adjusted using pairwise constraints provided by CP experts. We develop a two-step active learning query strat- egy to identify and query the most informative patient cases, enhancing query efficiency and minimizing the number of interactions required between experts and the algorithm. In addition to clustering, we analyze data arised from CP studies and explore predictive modeling. Canonical correlation analysis (CCA) is applied to investigate relationships among CP measurements, revealing important connections between pain characteristics and psycho- logical factors. Furthermore, multiple classification models are used to predict nociplastic pain, and the best cut of each predictor is investigated using the prediction model. Overall, we made significant contributions to the field of CP studies by introducing novel methods for clustering CP patients and analyzing complex data relationships. The proposed approaches emphasize clinical applicability, interpretability, and the integration of domain knowledge, offering practical solutions for real-world challenges in CP management. These advancements provide a foundation for further exploration of personalized treatment strategies and an improved understanding of chronic pain mechanisms.
Summary for Lay Audience
This research focuses on improving how we group and understand patients with chronic pain (CP) based on their unique characteristics and experiences. Chronic pain is a complex con- dition, often involving both physical and psychological factors, which makes it challenging to study and treat effectively. Many patients complete detailed questionnaires about their pain and mental health, which provide valuable insights but are also subjective and difficult to analyze. To address this, we developed new methods to group patients based on their questionnaire responses and other data. These methods aim to make the grouping process more accurate and easy to interpret for clinicians. For example, one part of the research introduces a special way to measure the similarity between patients, taking into account both numerical data, like age, and subjective data, like how patients describe their pain. This helps create groups of patients with similar characteristics. Another part of the research focuses on improving these methods by involving input from experts who can guide the grouping process. We designed techniques to ask the most important questions first to save time and effort. In addition to grouping patients, we studied how different factors, like pain levels and mental health conditions, are related to each other. We also developed a profile to predict a specific type of pain, the nociplastic pain, which may not have an obvious physical cause but can significantly affect a person’s life. Overall, this thesis contributes to better ways of analyzing chronic pain data. The goal is to help doctors identify patterns and make decisions more effectively, ultimately leading to better, more personalized treatment options for patients. By combining advanced data analysis with practical, easy-to-use tools, this research bridges the gap between complex data and real-world clinical care.
Recommended Citation
Deng, Gansen, "Statistical Learning Methods for Challenges arised from Self-Reported Data" (2025). Electronic Thesis and Dissertation Repository. 10805.
https://ir.lib.uwo.ca/etd/10805