Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article

Degree

Master of Science

Program

Computer Science

Supervisor

Fang Fang

Abstract

Federated learning (FL), as a privacy-preserving learning paradigm, enables multiple users to collaboratively train a global model without sharing their local data. However, the data distribution among user devices is often non-independent and identically distributed (non-IID) in nature, making efficient global aggregation of local models difficult. Moreover, when conducting FL over wireless networks, the training and communication efficiency of FL is often restricted by the limited computational and communication resources of user devices. Existing methods generally focus either on alleviating the impact of non-IID data or developing more efficient resource allocation schemes, but rarely address both aspects simultaneously. In this thesis, we propose a novel user selection scheme for knowledge distillation-based global aggregation to select users whose local models can be more efficiently aggregated. Subsequently, by investigating the impact of user resource allocation on FL performance over wireless networks, we propose a resource allocation scheme for the selected users to improve the training and communication efficiency of FL. To further improve communication efficiency in FL, a knowledge distillation based algorithm is proposed to directly reduce the communication overhead between the server and user devices (e.g. the exchange of global model parameters), without compromising the accuracy of the global model. Finally, extensive experiments demonstrate our proposed scheme and algorithm achieve superior performance in terms of accuracy, training and communication efficiency.

Summary for Lay Audience

Federated Learning (FL) allows multiple devices, such as smartphones or IoT gadgets, to collaboratively train a global learning model without exchanging their private data. This decentralized framework enhances user privacy, as each device processes data locally and only shares model updates. However, effective FL implementation faces two major challenges: data heterogeneity and resource constraints. Since each device collects unique data, this variation can reduce the overall model's accuracy. Additionally, the limited computational and communication resources on many devices hinder the efficient coordination and performance of FL. This thesis introduces several methods to overcome these challenges and optimize FL in wireless networks. First, a user selection approach based on knowledge distillation is proposed. This approach strategically selects users whose data and models contribute to improved aggregation quality. Next, a resource allocation scheme is designed to optimize computational power and frequency, enabling devices to participate effectively without exceeding their energy limits. Finally, a novel communication-efficient FL framework is proposed to reduce the data transferred during training rounds, which decreases network communication overhead and speeds up model training. Extensive experiments demonstrate that these techniques significantly improve FL accuracy, efficiency, and convergence speed in scenarios with diverse data and limited resources. This research advances FL as a viable solution for privacy-preserving machine learning in practical applications, where data privacy and resource constraints are critical, including in healthcare, finance, and smart home systems.

Share

COinS