Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article


Doctor of Philosophy


Medical Biophysics


Ward, Aaron D.

2nd Supervisor

Hajdok, George



Brain metastases (BMs) represent advanced cancer, and so BM patients must be treated quickly and effectively while minimizing treatment toxicities. Stereotactic radiosurgery (SRS) uses conformal, ablative radiation doses to treat BMs, but with failure rates up to 30%. A predictive model of BM progression post-SRS would therefore aid in BM treatment selection and SRS planning. Previous studies have used pre-treatment T1-weighted contrast-enhanced magnetic resonance imaging (T1w-CE MRI) to predict SRS outcomes, through quantitative radiomic analysis with machine learning (ML) and qualitative appearance analysis. Comparison of these methods is not well understood, and ML methods have not been studied for sensitivity to relevant clinical factors, robust model interpretability, or multi-centre external validation. To meet these needs, a dataset of 123 BMs across 99 SRS patients was used to develop T1w-CE MRI radiomics-based ML models to understand their sensitivity to clinical factors. A ML model using radiomic and clinical features obtained the highest area-under-the-receiver-operating-characteristic-curve (AUC) of 0.77, and this model was sensitive to primary cancer site, BM volume, and MRI scanner model. BM volume sensitivity was reduced by removing volume-correlated radiomic features. An observer study of BM qualitative analysis revealed high interobserver variability that in turn limited outcome prediction, while ML models provided enhanced stratification of BMs into risk groups for post-SRS progression (Kaplan-Meier rank-sum p = 0.0003). BM qualitative appearance was useful for ML model interpretation, revealing the post-SRS progression radiomic signature is tied to necrotic or heterogeneous BM appearance, indicating a potentially less radiosensitive, hypoxic environment. An additional dataset of 117 BMs across 62 SRS patients was collected at a second centre for external validation. Transferring a locked model between centres revealed poor performance, but limiting the model to use radiomic features important at both centres increased the AUC to 0.70. Retraining a model with the second centre’s dataset using a locked methodology developed with the first centre’s dataset achieved a higher AUC of 0.80. In conclusion, this work successfully characterized radiomics-based ML model performance with respect to clinical factors and BM qualitative appearance, while also providing the ML model interpretability and external validation necessary to motivate future research and clinical translation.

Summary for Lay Audience

Brain metastases (BMs) occur when a patient’s cancer spreads to their brain. BMs cause painful symptoms and even death, and so must be treated quickly and effectively, while minimizing side-effects. Stereotactic radiosurgery (SRS) uses radiation to destroy BMs, but SRS may fail. To aid in decision-making on how to best use SRS, a system using pretreatment BM data as input could be developed to predict SRS failure. Magnetic resonance imaging (MRI) provides images within the brain and BMs, which clinicians can use to qualitatively score BM appearance to predict SRS failure. BM MRI can also undergo computerized extraction of quantitative “radiomic” data which machine learning (ML) systems use to learn how to predict SRS failure. Both approaches are useful, but they need robust comparison. How ML systems make predictions is not well interpreted, along with their sensitivity to variability in the input data. ML systems must also be validated to work at multiple cancer centres. This research used a dataset of 123 BMs to create predictive ML systems. It was found that variability in where a patient’s cancer started, BM volume, and model of MRI scanner was important to consider. Next, each BM’s qualitative appearance was scored by multiple clinicians. High scoring variability was found, showing the difficulty of using qualitative analysis. Since ML systems are computerized, they avoid this same variability, and they also better divided BMs into groups at risk for SRS failure. BM qualitative appearance was then used to better interpret how the ML systems were making decisions. It was found that the BMs the ML systems predicted would fail SRS were also the BMs labeled with qualitative appearances that indicate low BM oxygen levels, which negatively impacts SRS effectiveness. Lastly, we gathered another dataset of 117 BMs from a different cancer centre. The ML systems were the most accurate when systems were built for each centre, showing that our method for ML system building is validated across multiple centres. These advances in further understanding, comparing, and validating predictive ML systems are all important in moving such systems closer to helping patients receive the best care possible.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.