Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Doctor of Philosophy

Program

Statistics and Actuarial Sciences

Supervisor

McLeod, Ian

Abstract

This thesis deals with the problem of classification in general, with a particular focus on heavy-tailed or skewed data. The classification problem is first formalized by statistical learning theory and several important classification methods are reviewed, where the distance-based classifiers, including the median-based classifier and the quantile-based classifier (QC), are especially useful for the heavy-tailed or skewed inputs. However, QC is limited by its model capacity and the issue of high-dimensional accumulated errors. Our objective of this study is to investigate more general methods while retaining the merits of QC.

We present four extensions of QC, which appear in chronological order and preserve the ideas driving our research. The first extension, ensemble quantile classifier (EQC), treats QC as a base learner in ensemble learning to increase model capacity and introduces weight decay regularization to mitigate high-dimensional accumulated errors. The second extension, multiple quantile classifier (MQC), enhances the model capacity of EQC by allowing multiple quantile-difference transformations to be conducted for each variable. The third extension, factorized multiple quantile classifier (FMQC), adds higher-order interactions to MQC via a computationally efficient approach of adaptive factorization machines. The fourth extension, deep multiple quantile classifier (DeepMQC), embeds the MQC into the flexible framework of deep neural networks and opens more possibilities of applications to various tasks. We discuss the theoretical motivation for each method. Numerical studies on synthetic and real datasets are used to demonstrate the improvement of the proposed methods.

Summary for Lay Audience

Classification is ubiquitous in real life such as determining whether an email is a spam or whether it is going to rain tomorrow. There are many classification methods developed for different purposes. In particular, we are interested in the quantile-based classifier (QC) which is one of the recent classification methods. QC performs well when the data contains skewed variables. For example, we may want to classify a person's BMI category based on his/her family income. The family income can be a skewed variable if some families are extremely wealthy compared to the majority. In this research, we point out several limitations of QC and propose four extensions that progressively stress these problems and enhance the predictive ability.

In the first extension, we use a meta-learner to combine QC, where meta-learner represents some other classification methods. This is a kind of ensemble learning that was first derived from the idea popularly known as {\it Wisdom of the Crowd}. In the second extension, we further adjust the EQC to allow more realistic hypotheses. In the third extension, we provide a computationally efficient way of incorporating variable interactions into MQC. In the fourth extension, we integrate the aforementioned methods with deep learning, which has gained success in many different domains including image and speech recognition. Numerical studies on synthetic and real datasets are used to demonstrate the improvement of the proposed methods.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS