Thesis Format
Monograph
Degree
Master of Engineering Science
Program
Electrical and Computer Engineering
Collaborative Specialization
Artificial Intelligence
Supervisor
Samarabandu, Jagath
Abstract
Corporate networks are constantly bombarded by malicious actors trying to gain access. The current state of the art in protecting networks is deep learning-based intrusion detection systems (IDS). However, for an IDS to be effective it needs to be trained on a good dataset. The best datasets for training an IDS are real data captured from large corporate networks. Unfortunately, companies cannot release their network data due to privacy concerns creating a lack of public cybersecurity data. In this thesis I take a novel approach to network dataset anonymization using character-level LSTM models to learn the characteristics of a dataset; then generate a new, anonymized, synthetic dataset, with similar characteristics to the original. This method shows excellent performance when tested for characteristic preservation and anonymization performance on three datasets. One that includes malicious and benign URLs, one with DNS packets, and one with malicious and benign TCP packets. Using this method I take the first step in solving the lack of publication of private network datasets.
Summary for Lay Audience
Corporate networks are constantly bombarded by hackers trying to gain access. The current state of the art in protecting networks is using artificial intelligence (AI) driven intrusion detection systems (IDS). However, for an IDS to be effective it needs to learn what a hacker's network activity looks like from a good dataset. The best datasets for training an IDS are real and from large corporate networks. Unfortunately, companies cannot release their network data due to privacy concerns creating a lack of public cybersecurity data. In this thesis I take a novel approach to network dataset anonymization using AI to learn the characteristics of a dataset; then generate a new, anonymized, synthetic dataset, with similar characteristics to the original. This method is tested for characteristic preservation and anonymization performance on three datasets. One that includes malicious and benign website addresses, one with DNS packets, and one with malicious and benign TCP packets. The results showed the AI was able to learn the structure and composition of these datasets and then generate its own synthetic anonymized version of these datasets. Using this AI-driven approach I take the first step in solving the lack of publicly available private network datasets for training IDSs.
Recommended Citation
Vecile, Spencer K., "Anonymization & Generation of Network Packet Datasets Using Deep learning" (2022). Electronic Thesis and Dissertation Repository. 8792.
https://ir.lib.uwo.ca/etd/8792
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License