Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article


Doctor of Philosophy


Computer Science


Sedig, Kamran


Electronic health record systems (EHRs) facilitate the storage, retrieval, and sharing of patient health data; however, the availability of data does not directly translate to support for tasks that healthcare providers encounter every day. In recent years, healthcare providers employ a large volume of clinical data stored in EHRs to perform various complex data-intensive tasks. The overwhelming volume of clinical data stored in EHRs and a lack of support for the execution of EHR-driven tasks are, but a few problems healthcare providers face while working with EHR-based systems. Thus, there is a demand for computational systems that can facilitate the performance of complex tasks that involve the use and working with the vast amount of data stored in EHRs. Visual analytics (VA) offers great promise in handling such information overload challenges by integrating advanced analytics techniques with interactive visualizations. The user-controlled environment that VA systems provide allows healthcare providers to guide the analytics techniques on analyzing and managing EHR data through interactive visualizations.

The goal of this research is to demonstrate how VA systems can be designed systematically to support the performance of complex EHR-driven tasks. In light of this, we present an activity and task analysis framework to analyze EHR-driven tasks in the context of interactive visualization systems. We also conduct a systematic literature review of EHR-based VA systems and identify the primary dimensions of the VA design space to evaluate these systems and identify the gaps. Two novel EHR-based VA systems (SUNRISE and VERONICA) are then designed to bridge the gaps. SUNRISE incorporates frequent itemset mining, extreme gradient boosting, and interactive visualizations to allow users to interactively explore the relationships between laboratory test results and a disease outcome. The other proposed system, VERONICA, uses a representative set of supervised machine learning techniques to find the group of features with the strongest predictive power and make the analytic results accessible through an interactive visual interface. We demonstrate the usefulness of these systems through a usage scenario with acute kidney injury using large provincial healthcare databases from Ontario, Canada, stored at ICES.

Summary for Lay Audience

Many medical organizations adopt electronic health record systems (EHRs) to replace traditional paper-based patient records as they modernize their operations. EHR data includes patients’ medical history, medications, diagnoses, treatment plans, and laboratory test results. Healthcare professionals use EHR-based systems to perform various tasks that involve the use and working with a vast amount of data stored in EHRs. Such tasks include identifying patients at high risk of developing diseases, monitoring a patient’s condition, and studying the effect of treatments, among others. Despite the benefits of EHR systems, they fail to meet the healthcare professional’s computational needs. Therefore, it seems like there is a need for computational tools that can support the execution of various tasks on large bodies of data in EHRs. This research aims to prove the usefulness of computational tools, known as visual analytics, in performing different tasks on EHRs. VA combines the strength of data analytics techniques with interactive visualizations to allow healthcare professionals to explore and analyze the clinical data interactively. We first identify the gaps in support of tasks performed by EHR-based systems using a proposed framework. We then provide a comprehensive overview of EHR-based VA systems through a systematic literature review. We evaluate these systems based on the tasks, analytics, visualizations, and interactions they support and identify the areas with little prior work. We develop two novel VA systems (SUNRISE and VERONICA) to show how the VA approach can be used to address the challenges of EHRs. SUNRISE is designed to help healthcare professionals to identify relationships between laboratory test results and a disease. VERONICA uses several analytics techniques to find the best representative group of features in identifying high-risk patients. We show how these VA systems can be used to solve real-world problems using the healthcare datasets from Ontario, Canada, stored at ICES.

Included in

Data Science Commons