Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article


Doctor of Philosophy


Electrical and Computer Engineering

Collaborative Specialization

Artificial Intelligence


Essex, Aleksander


The enormous development in the connectivity among different type of networks poses significant concerns in terms of privacy and security. As such, the exponential expansion in the deployment of cloud technology has produced a massive amount of data from a variety of applications, resources and platforms. In turn, the rapid rate and volume of data creation in high-dimension has begun to pose significant challenges for data management and security. Handling redundant and irrelevant features in high-dimensional space has caused a long-term challenge for network anomaly detection. Eliminating such features with spectral information not only speeds up the classification process, but also helps classifiers make accurate decisions during attack recognition time, especially when coping with large-scale and heterogeneous data such as network traffic data. Furthermore, the continued evolution of network attack patterns has resulted in the emergence of zero-day cyber attacks, which nowadays has considered as a major challenge in cyber security. In this threat environment, traditional security protections like firewalls, anti-virus software, and virtual private networks are not always sufficient. With this in mind, most of the current intrusion detection systems (IDSs) are either signature-based, which has been proven to be insufficient in identifying novel attacks, or developed based on absolute datasets. Hence, a robust mechanism for detecting intrusions, i.e. anomaly-based IDS, in the big data setting has therefore become a topic of importance. In this dissertation, an empirical study has been conducted at the initial stage to identify the challenges and limitations in the current IDSs, providing a systematic treatment of methodologies and techniques. Next, a comprehensive IDS framework has been proposed to overcome the aforementioned shortcomings. First, a novel hybrid dimensionality reduction technique is proposed combining information gain (IG) and principal component analysis (PCA) methods with an ensemble classifier based on three different classification techniques, named IG-PCA-Ensemble. Experimental results show that the proposed dimensionality reduction method contributes more critical features and reduced the detection time significantly. The results show that the proposed IG-PCA-Ensemble approach has also exhibits better performance than the majority of the existing state-of-the-art approaches.

Summary for Lay Audience

Intrusion detection system is considered a fundamental security tool in computer systems due its sophisticated capabilities in combating the potential cyber attacks. It is basically monitoring network traffic for unusual or malicious activity and sends an alert to the administrator when such activity is discovered. Detecting such malicious activities has been a subject of study for decades. As data scientists can appreciate, however, when the scale of a problem grows by an order of magnitude, existing approaches often are no longer effective; the problem is sufficiently different that it requires a new solution altogether. More specifically, classical security methods such as firewalls, malware prevention, data encryption, and user authentication form a necessary but incomplete set of tools to secure computers and networks from today's attacks. Hence, additional lines of defense such as artificial intelligence, have become a quickly growing area of interest. The work presented in this dissertation proposes an innovative solution to overcome the aforementioned challenges. In this context, different data mining techniques were utilized including a novel hybrid dimensionality reduction method, supervised and unsupervised machine learning techniques to provide a robust mechanism for detecting network intrusions. The experimental results show that the proposed dimensionality reduction method contributes more critical features for the proposed model to achieve better performance and lower computational cost compared with the state-of-the-art methods.