Doctor of Philosophy
Electrical and Computer Engineering
The Internet-of-Things (IoT) systems have emerged as a prevalent technology in our daily lives. With the wide spread of sensors and smart devices in recent years, the data generation volume and speed of IoT systems have increased dramatically. In most IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine Learning (ML) approaches have shown their capacity for IoT data analytics. However, applying ML models to IoT data analytics tasks still faces many difficulties and challenges. The first challenge is to process large amounts of dynamic IoT data to make accurate and informed decisions. The second challenge is to automate and optimize the data analytics process. The third challenge is to protect IoT devices and systems against various cyber threats and attacks. To address the IoT data analytics challenges, this thesis proposes various ML-based frameworks and data analytics approaches in several applications.
Specifically, the first part of the thesis provides a comprehensive review of applying Automated Machine Learning (AutoML) techniques to IoT data analytics tasks. It discusses all procedures of the general ML pipeline. The second part of the thesis proposes several supervised ML-based novel Intrusion Detection Systems (IDSs) to improve the security of the Internet of Vehicles (IoV) systems and connected vehicles. Optimization techniques are used to obtain optimized ML models with high attack detection accuracy. The third part of the thesis developed unsupervised ML algorithms to identify network anomalies and malicious network entities (e.g., attacker IPs, compromised machines, and polluted files/content) to protect Content Delivery Networks (CDNs) from service targeting attacks, including distributed denial of service and cache pollution attacks. The proposed framework is evaluated on real-world CDN access log data to illustrate its effectiveness. The fourth part of the thesis proposes adaptive online learning algorithms for addressing concept drift issues (i.e., data distribution changes) and effectively handling dynamic IoT data streams in order to provide reliable IoT services. The development of drift adaptive learning methods can effectively adapt to data distribution changes and avoid data analytics model performance degradation.
Summary for Lay Audience
The Internet-of-Things (IoT) systems have emerged as a prevalent technology in our daily lives. With the wide spread of sensors and smart devices in recent years, the data generation volume and speed of IoT systems have increased dramatically. In most IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine learning (ML) is a subfield of Artificial Intelligence (AI), enabling machines to learn useful information and patterns from data without explicitly being programmed. ML algorithms have been developed as a promising technique that enables the rapid and accurate processing of massive volumes of data produced by IoT systems to identify patterns required by IoT services.
This thesis focuses on the use of ML algorithms in IoT data analytics and cyber-security applications. To improve ML models’ learning performance and reduce human efforts in data analytics tasks, Automated Machine Learning (AutoML) and optimization techniques are studied and developed in this thesis to automatically obtain optimized ML models with the best performance. This thesis can be divided into four distinct parts. Specifically, the first part of the thesis provides a comprehensive review of applying AutoML techniques to IoT data analytics tasks. The second part of the thesis proposes several ML-based novel intrusion detection techniques to identify various types of common network attacks in vehicle networks and protect connected vehicles. The third part of the thesis develops ML algorithms to identify network anomalies (i.e., cyber-attacks) and malicious network entities (e.g., attacker IPs and compromised machines) in massive real-world network log data to protect Content Delivery Networks (CDNs), an essential network for Internet traffic communications, against cyber-attacks. The fourth part of the thesis proposes adaptive online learning algorithms for addressing concept drift issues in dynamic IoT data streams. Concept drift indicates unpredictable events, like the COVID-19 pandemic, which cause data distribution changes and data analytics performance degradation. The proposed methods can effectively handle ever-changing data stream patterns to provide reliable IoT services.
Yang, Li, "Optimized and Automated Machine Learning Techniques Towards IoT Data Analytics and Cybersecurity" (2022). Electronic Thesis and Dissertation Repository. 8734.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.