Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Doctor of Philosophy

Program

Electrical and Computer Engineering

Supervisor

Katarina Grolinger

Abstract

Natural Language Processing (NLP) uses Machine Learning (ML), particularly deep learning, to understand and generate human language. Since text data are often sensitive and distributed across devices, traditional centralized ML training raises privacy concerns. Federated Learning (FL) mitigates privacy risks by allowing decentralized model training while keeping data local. Although FL performs well with Independent and Identically Distributed (IID) data, its performance drops with non-IID data. For example, in sentiment analysis, one client may have only positive reviews while another has only negative ones—leading to label imbalance, a common non-IID scenario.

This research explores FL for sentiment analysis, focusing on how non-IID data affects ML models, including Feed-Forward Neural Network (FFNN), Long-Short Term-Memory (LSTM), and transformer. Results show that label imbalance impacts performance more than data size imbalance. Among the models, the transformer is the most resilient to non-IID data. Even under label imbalance, the transformer shows minimal accuracy drop, outperforming FFNN and LSTM.

To enhance FL with non-IID data, Clustered FL (CFL) trains separate models for groups of clients with similar data distribution. Current evaluation techniques, such as Best-Fit Cluster Evaluation (BFCE) use test client labels for model selection, despite the fact that they should be reserved strictly for evaluation–violating the ML principle and inflating accuracy estimates. This research addresses this issue by proposing a new evaluation method that separates model selection from test data, showing BFCE notably overestimates accuracy, particularly in non-IID scenarios.

To eliminate reliance on test client labels in CFL, this research proposes an autoencoder-based unsupervised clustering method using K-means. Clients train autoencoders locally, and extracted weights are used for clustering. An importance-driven aggregation further improves stability by prioritizing clients near the cluster center. This label-free approach outperforms other techniques across non-IID scenarios, achieving 94.85% accuracy on highly skewed sentiment data, compared to 65.92% achieved with traditional FL.

Single-task NLP models are inefficient, requiring separate training and high memory, which becomes a major challenge when deployed on resource-constrained devices. EdgeDistilBERT, a proposed lightweight multi-task FL framework designed for resource-constrained devices, tackles efficiency and privacy in related NLP tasks. It leverages self-supervised learning, knowledge distillation, context-aware aggregation, and low-rank adaptation. In experiments, it outperformed single-task models, improving sarcasm detection accuracy from 63.6% to 67.61% and sentiment analysis from 77.24% to 79.83%.

Summary for Lay Audience

Sentiment analysis involves understanding people's emotions expressed in text, such as social media posts and customer reviews. Companies, researchers, and policymakers use it to track opinions, improve products, and make informed decisions. However, training Artificial Intelligence (AI) models for sentiment analysis usually requires gathering large amounts of personal data from different locations in one place, which creates privacy concerns.

Federated Learning (FL) is a new AI technique that allows multiple devices (such as smartphones) to train a model without sharing personal data. Instead of sending data to a central server, devices learn locally and share only the learned knowledge with a server. This helps protect privacy but introduces a challenge: data across devices can vary in quantity, distribution, or feature characteristics. This uneven distribution of data (called non-IID data) makes training less effective, reducing the accuracy.

This research first explores different AI models, including various neural networks, and finds that one model type--transformer--performs better than others at handling unbalanced data. Improving model performance can be achieved through clustering, where similar devices are grouped together, and a model is trained for each cluster. However, traditional clustering methods require test device labels, which may be unavailable or, when used, can inflate accuracy estimates, leading to misleading results.

To address this, this research introduces a better way to form device groups using Autoencoders (AE) and K-means clustering. This method creates clusters without relying on data labels, ensuring that FL models are formed according to machine learning principles and properly evaluated. Another improvement is an importance-driven aggregation strategy, where devices with more representative data contribute more to the learning process.

To further optimize FL for smaller, low-power devices, the study introduces EdgeDistilBERT, a lightweight AI model for sentiment analysis and sarcasm detection. Using techniques such as self-supervised learning, knowledge distillation, context-aware aggregation, and parameter-efficient fine-tuning (LoRA), EdgeDistilBERT handles two tasks simultaneously, improves accuracy while reducing communication costs between devices.

Available for download on Saturday, May 29, 2027

Share

COinS