Electronic Thesis and Dissertation Repository


Doctor of Philosophy


Statistics and Actuarial Sciences


Dr. Wenqing He


Microarray technology is essentially a measurement tool for measuring expressions of genes, and this measurement is subject to measurement error. Gene expressions could be employed as predictors for patient survival, and the measurement error involved in the gene expression is often ignored in the analysis of microarray data in the literature. Efforts are needed to establish statistical method for analyzing microarray data without ignoring the error in gene expression. A typical microarray data set has a large number of genes far exceeding the sample size. Proper selection of survival relevant genes contributes to an accurate prediction model. We study the effect of the measurement error on survival relevant gene selection under the accelerated failure time (AFT) model setting by regularizing weighted least square estimator with adaptive LASSO penalty. The simulation results and real data analysis show that ignoring measurement error will affect survival relevant gene selection. Simulation-Extrapolation (SIMEX) method is investigated to adjust the impact of measurement error to gene selection. The resulting model after adjustment is more accurate than the model selected by ignoring measurement error. Microarray experiments are often performed over a long period of time, and samples can be prepared and collected under different conditions. Moreover, different protocols or methodology may be applied in the experiment. All these factors contribute to a possibility of heteroscedastic measurement error associated with microarray data set. It is of practical importance to combine microarray data from different labs or platforms. We construct a prediction AFT model using data with heterogeneous covariate measurement error. Two variations of the SIMEX algorithm are investigated to adjust the effect of the mis-measured covariates. Simulation results show that the proposed method can achieve better prediction accuracy than the naive method. In this dissertation, the SIMEX method is used to adjust for the effects of covariate measurement error. This method is superior to other conventional methods in that it is not only more robust to distributional assumptions for error prone covariates, it also offers marked simplicity and flexibility for practical use. To implement this method, we developed an R package for general users.