
A Visual Analytics System for Investigating Multimorbidity Using Supervised Machine Learning
Abstract
Patterns of multimorbidity are complex and difficult to summarise using static visualization techniques like tables and charts. We present a visual analytics system with the goal of facilitating the process of making sense of data collected from patients with multimorbidity. The system reveals underlying patterns in the data visually and interactively, which enables users to easily assess both prevalence and correlation estimates of different chronic diseases among multimorbid patients with varying characteristics. To do so, the system uses count-based conditional probability, binary logistic regression, softmax regression and decision tree models to dynamically compute and visualize prevalence and correlation estimates for subsets of the data characterized by a user-selected set of pre-existing chronic conditions. The system also allows the user to examine the impact of adjusting for characteristics like age and gender on both the prevalence estimates and on correlations among diseases. By dynamically changing patient characteristics of interest and examining the resulting visualizations, the user can explore how prevalence and correlation estimates change with disease diagnosis and with other patient characteristics. This thesis is therefore a significant effort in understanding high-dimensional joint distributions of random variables and the created system can be used in any domain, such as economics, politics or social sciences, in which investigating the relationships between several random variables is vital to drawing the right conclusion.