Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article

Degree

Doctor of Philosophy

Program

Statistics and Actuarial Sciences

Supervisor

Yi, Grace Y.

Abstract

In this thesis, we employ statistical modeling and methods to examine COVID-19 data, and we develop new methods to address new issues that invalidate some standard methods.

In the first study, we employ semiparametric and nonparametric survival models as well as data visualization techniques to examine the epidemiological features of COVID-19. Based on our numerical results, the median incubation time is about 5 days, and the elders are more likely to have longer incubation periods.

In the second study, we use data from 175 countries and investigate possible factors associated with the case fatality rate (CFR) of COVID-19. The Q-learning algorithm is employed to assess optimal preventive policies adopted by individual countries to reduce their COVID-19 CFRs. The data analysis suggests that, in addition to addressing traditional risk factors, policymakers should tailor the strictness of preventive policies to country-specific characteristics and evolving situation to alleviate the risk of death from COVID-19.

The third study investigates the effects of misclassified covariates in developing dynamic treatment regimes with the Q-learning approach. We present two procedures to account for the bias induced by covariate misclassification. The satisfactory performance of these procedures is demonstrated through extensive simulation studies.

The fourth study deals with mixed measurement error and misclassification in covariates within the context of Q-learning with compound outcome. We demonstrate that the presence of such measurement inaccuracies can pose significant challenges to the accurate estimation process in Q-learning. To address this issue, we propose effective correction strategies that successfully alleviate the impact of mismeasurement.

Summary for Lay Audience

This thesis explores various statistical and reinforcement learning methods to gain insights into epidemiological characteristics and effective containment measures related to COVID-19. However, caution must be exercised when interpreting the results from such analyses due to the use of error-contaminated data. Consequently, as a complementary remedy, we develop procedures for addressing such complications.

In the first study, we use a dataset, dated from January 22, 2020 to March 29, 2020, to examine epidemiological characteristics of COVID-19. We use survival analysis techniques to quantify how the recovery time may be associated with age and gender. Using data visualization and text mining tools, we study incubation times, fatality rate, as well as most common symptoms. Based on our numerical results, the median incubation time is about 5 days, and the elders are more likely to have longer incubation periods. Furthermore, we find that the median recovery time for infected patients is about 20 days, and there is no gender difference in recovery times.

In the second study, we use data from 175 countries from January 13 of 2020 to March 9 of 2021, and investigate possible factors associated with the case fatality rate of COVID-19. The Q-learning algorithm is employed to assess optimal preventive policies adopted by individual countries to reduce their COVID-19 case fatality rates. The data analysis suggests that, in addition to addressing traditional risk factors, policymakers should tailor the strictness of preventive policies to country-specific characteristics and evolving situation to alleviate the risk of death from COVID-19.

The third study investigates the effects of misclassified covariates in developing dynamic treatment regimes with the Q-learning approach. We present two procedures to account for the bias induced by covariate misclassification. The satisfactory performance of these procedures is demonstrated through extensive simulation studies.

The fourth study deals with mixed measurement error and misclassification in covariates within the context of Q-learning with compound outcome. We demonstrate that the presence of such measurement inaccuracies can pose significant challenges to the accurate estimation process in Q-learning. To address this issue, we propose effective correction strategies that successfully alleviate the impact of mismeasurement.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Available for download on Sunday, August 11, 2024

Share

COinS