
FEDERATED LEARNING FOR NATURAL LANGUAGE PROCESSING
Abstract
Natural Language Processing (NLP) uses Machine Learning (ML), particularly deep learning, to understand and generate human language. Since text data are often sensitive and distributed across devices, traditional centralized ML training raises privacy concerns. Federated Learning (FL) mitigates privacy risks by allowing decentralized model training while keeping data local. Although FL performs well with Independent and Identically Distributed (IID) data, its performance drops with non-IID data. For example, in sentiment analysis, one client may have only positive reviews while another has only negative ones—leading to label imbalance, a common non-IID scenario.
This research explores FL for sentiment analysis, focusing on how non-IID data affects ML models, including Feed-Forward Neural Network (FFNN), Long-Short Term-Memory (LSTM), and transformer. Results show that label imbalance impacts performance more than data size imbalance. Among the models, the transformer is the most resilient to non-IID data. Even under label imbalance, the transformer shows minimal accuracy drop, outperforming FFNN and LSTM.
To enhance FL with non-IID data, Clustered FL (CFL) trains separate models for groups of clients with similar data distribution. Current evaluation techniques, such as Best-Fit Cluster Evaluation (BFCE) use test client labels for model selection, despite the fact that they should be reserved strictly for evaluation–violating the ML principle and inflating accuracy estimates. This research addresses this issue by proposing a new evaluation method that separates model selection from test data, showing BFCE notably overestimates accuracy, particularly in non-IID scenarios.
To eliminate reliance on test client labels in CFL, this research proposes an autoencoder-based unsupervised clustering method using K-means. Clients train autoencoders locally, and extracted weights are used for clustering. An importance-driven aggregation further improves stability by prioritizing clients near the cluster center. This label-free approach outperforms other techniques across non-IID scenarios, achieving 94.85% accuracy on highly skewed sentiment data, compared to 65.92% achieved with traditional FL.
Single-task NLP models are inefficient, requiring separate training and high memory, which becomes a major challenge when deployed on resource-constrained devices. EdgeDistilBERT, a proposed lightweight multi-task FL framework designed for resource-constrained devices, tackles efficiency and privacy in related NLP tasks. It leverages self-supervised learning, knowledge distillation, context-aware aggregation, and low-rank adaptation. In experiments, it outperformed single-task models, improving sarcasm detection accuracy from 63.6% to 67.61% and sentiment analysis from 77.24% to 79.83%.