Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article

Degree

Doctor of Philosophy

Program

Electrical and Computer Engineering

Supervisor

Dr. Abdallah Shami

2nd Supervisor

Dr. Ali Bou Nassif

Co-Supervisor

Abstract

The rapid growth of the Internet and related technologies has led to the collection of large amounts of data by individuals, organizations, and society in general [1]. However, this often leads to information overload which occurs when the amount of input (e.g. data) a human is trying to process exceeds their cognitive capacities [2]. Machine learning (ML) has been proposed as one potential methodology capable of extracting useful information from large sets of data [1]. This thesis focuses on two applications. The first is education, namely e-Learning environments. Within this field, this thesis proposes different optimized ML ensemble models to predict students’ performance at earlier stages of the course delivery. Experimental results showed that the proposed optimized ML ensemble models accurately identified the weak students who needed help. More specifically, these models achieved an accuracy of up to 96% in the binary case and 93.1% in the multi-class case. The second application is network security intrusion detection. Within this application field, this thesis proposes different optimized ML classification frameworks using a variety of optimization modeling algorithms and heuristics to improve the performance of the IDSs through anomaly detection while maintaining or reducing their time complexity. Experimental results showed that the developed models reduced the training sample size by up to 74%, reduced the feature set size by almost 60%, and improved the detection accuracy by up to 2%. This thesis can be divided into two main parts. The first part analyzes different educational datasets and proposes different optimized ML classification ensemble models that accurately predict weak students who may need help. The second part proposes optimized ML classification frameworks that accurately detect network attacks while maintaining a low false alarm rate and time complexity. It is noteworthy that the developed models and frameworks could be generalized as follows:

  • Optimized ML ensemble models proposed in the first part of this thesis can be generalized to many applications such as finance, network security, social media, and healthcare systems.
  • Optimized ML classification models proposed in the second part of this thesis can be generalized to other applications that typically generate large datasets in terms of instances and feature set.

Summary for Lay Audience

The rapid growth of the Internet and related technologies has led to the collection of large amounts of data by individuals, organizations, and society in general. However, these large amounts of data often lead to information overload which occurs when the amount of input (e.g. data) that a human is trying to process exceeds their cognitive capacities. In turn, this can lead to humans ignoring, overlooking, or misinterpreting crucial information. Machine learning (ML) has been proposed as one potential data analysis and prediction methodology capable of extracting useful information from large sets of data. ML allows computers to learn without being explicitly programmed. Accordingly, the computer can apply what it has learned to find the learned patterns in similar data. Furthermore, ML allows computer systems to adapt and learn from their experience. This thesis focuses on two applications. The first is education, namely e-Learning environments. Within this field, this thesis proposes the use of different optimized ML models to predict students’ performance at earlier stages of the course delivery. The second application is network security intrusion detection. Within this application field, this thesis proposes different optimized ML classification frameworks using a variety of optimization modeling algorithms and heuristics to improve the performance of the IDSs through anomaly detection while maintaining or reducing their time complexity.

Share

COinS