Electronic Thesis and Dissertation Repository

A Framework for Characterising Performance in Multi-Class Classification Problems with Applications in Cancer Single Cell RNA Sequencing

Erik R. Christensen, The University of Western Ontario

Abstract

In many real-world scenarios, we need to use multi-class classifiers to properly identify all classes in a dataset. To evaluate performance of multi-class classifiers, we need to take various parameters into account. I created a framework that can be used to drill into the differences between algorithms in specific scenarios and better compare multiple classifiers. This allows researchers to better identify strengths and weaknesses of particular classifiers. Single-cell RNA-seq allows cancer researchers to define complex cell types (i.e. classes) in the tumour micro-environments (TME). Using eight datasets, I assessed performance of 26 methods from different perspectives, such as the ability to identify under-represented or imbalanced classes or identify distinct but related subgroups that have not been seen before within a population. This study can be used to select the best methods for multi-class classifications of complex datasets, such as scRNA-seq TME datasets, and provides avenues for future work.