
Predicting Brain Metastasis Response To Stereotactic Radiosurgery Using Magnetic Resonance Imaging Radiomics And Machine Learning
Abstract
Brain metastases (BMs) represent advanced cancer, and so BM patients must be treated quickly and effectively while minimizing treatment toxicities. Stereotactic radiosurgery (SRS) uses conformal, ablative radiation doses to treat BMs, but with failure rates up to 30%. A predictive model of BM progression post-SRS would therefore aid in BM treatment selection and SRS planning. Previous studies have used pre-treatment T1-weighted contrast-enhanced magnetic resonance imaging (T1w-CE MRI) to predict SRS outcomes, through quantitative radiomic analysis with machine learning (ML) and qualitative appearance analysis. Comparison of these methods is not well understood, and ML methods have not been studied for sensitivity to relevant clinical factors, robust model interpretability, or multi-centre external validation. To meet these needs, a dataset of 123 BMs across 99 SRS patients was used to develop T1w-CE MRI radiomics-based ML models to understand their sensitivity to clinical factors. A ML model using radiomic and clinical features obtained the highest area-under-the-receiver-operating-characteristic-curve (AUC) of 0.77, and this model was sensitive to primary cancer site, BM volume, and MRI scanner model. BM volume sensitivity was reduced by removing volume-correlated radiomic features. An observer study of BM qualitative analysis revealed high interobserver variability that in turn limited outcome prediction, while ML models provided enhanced stratification of BMs into risk groups for post-SRS progression (Kaplan-Meier rank-sum p = 0.0003). BM qualitative appearance was useful for ML model interpretation, revealing the post-SRS progression radiomic signature is tied to necrotic or heterogeneous BM appearance, indicating a potentially less radiosensitive, hypoxic environment. An additional dataset of 117 BMs across 62 SRS patients was collected at a second centre for external validation. Transferring a locked model between centres revealed poor performance, but limiting the model to use radiomic features important at both centres increased the AUC to 0.70. Retraining a model with the second centre’s dataset using a locked methodology developed with the first centre’s dataset achieved a higher AUC of 0.80. In conclusion, this work successfully characterized radiomics-based ML model performance with respect to clinical factors and BM qualitative appearance, while also providing the ML model interpretability and external validation necessary to motivate future research and clinical translation.