Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Doctor of Philosophy

Program

Statistics and Actuarial Sciences

Collaborative Specialization

Biostatistics

Supervisor

He, Wenqing

Abstract

The research is motivated by the prostate cancer imaging study conducted at the University of Western Ontario to classify cancer status using multiple in-vivo images. The prostate cancer histological image and the in-vivo images are subject to misalignment in the co-registration procedure, which can be viewed as measurement error in covariates or response. We investigate methods to correct this problem.

The first proposed method corrects the predicted class probability when the data has misclassified labels. The correction equation is derived from the relationship between the true response and the error-prone response. The probability for the observed class label is adjusted so it is close to the probability of the true label. A model can be built with the corrected class probability and the covariates for prediction purpose.

A weighted model method is proposed to construct classifiers with error-prone response. A weight is assigned to each data point according to its position, which indicates the data point's reliability. We propose the weighted models for different machine learning classifiers, such as logistic regression, SVM, KNN and classification tree. The weighted model incorporates the weight for each instance in the model building procedure, and the weighted classifiers trained with the error-prone data can be used for future prediction.

The misalignment in the co-registration procedure can also be treated as measurement error in covariates. A weighted data reconstruction method was proposed to deal with the corrupted covariates. The proposed method combines two moment reconstruction forms under different assumptions. We incorporated the weights of the data to build adjusted variables to replace the error-prone covariates. The classifiers can be trained on the reconstructed data set.

Numerical studies were carried out to assess the performance of each method, and the methods were applied to the prostate cancer imaging study. The results show all methods had significantly resolved the misalignment problem.

Summary for Lay Audience

This research investigates three methods to improve the prostate cancer detection accuracy with medical images when the image data was not correctly measured.

The prostate cancer is the most common cancer among Canadian men, but the current detection methods suffer from low accuracy and high variability. Using medical images like MRI to build statistical models to predict cancer status is a promising solution. The prostate cancer image research team at the University of Western Ontario collected image data for this modelling purpose, but the data had measurement error. The error can be viewed as the cancer labels (response) are wrong or the image intensity measurements (covariates) are corrupted. Various previous studies have shown that these kinds of measurement errors decrease the prediction performance.

The first method we proposed builds the relationship between the true cancer status and the mislabelled status. Through this relationship we can correct the predicted cancer label.

We define the reliability of each data point by its position in the medical image. This reliability is a probability that reflects how likely this point is correctly measured. We propose to combine this reliability measure with the statistical models so that the new models are less vulnerable to the measurement error problem.

Last we propose to combine the reliability of the data with the moment reconstruction method proposed by Freedman et al. (2004). The moment reconstruction method creates an "adjusted" value for the error-corrupted covariate such that the "adjusted" value is close to the true value. The form of moment reconstruction depends on the assumption of the type of the error. We have found out that the prostate image measurement error corresponds to two different error types, and the reliability reflects how likely is each error type. We combined these two error types to create the adjusted values for the covariates, with the proportion for each error-type determined by the reliability.

The simulation studies and the real data application have shown the proposed methods significantly improve the prediction performance.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Included in

Biostatistics Commons

Share

COinS