Electronic Thesis and Dissertation Repository

Degree

Master of Science

Program

Computer Science

Supervisor

Dr. Lucian Ilie

2nd Supervisor

Dr. Peter Rogan

Co-Supervisor

Abstract

The goal of this thesis was to examine different machine learning techniques for predicting chemotherapy response in cell lines and patients based on genetic expression. After trying regression, multi-class classification techniques and binary classification it was concluded that binary classification was the best method for training models due to the limited size of available cell line data. We found support vector machine classifiers trained on cell line data were easier to use and produced better results compared to neural networks. Sequential backward feature selection was able to select genes for the models that produced good results, however the greedy algorithm has limitations. We found that genetic algorithms and simulated annealing were able to select genes that produced better results on both cell lines and patients. We found that combining cell line data sets from different types of cancers produced models that performed well at predicting outcome in cell lines and in patients, indicating that the method of action of chemotherapy drugs is similar across different types of cancer. The use of cell line trained machine learning models to predict patient chemotherapy response shows great promise, however future studies need to acquire larger cell line data sets and find better ways of evaluating the transfer of predictive ability of cell line trained models to patients.

Share

COinS