
Electrical and Computer Engineering Publications
Document Type
Article
Publication Date
2024
Journal
IEEE Access
First Page
1
URL with Digital Object Identifier
10.1109/ACCESS.2024.3453068
Last Page
13
Abstract
In sentiment analysis, data are commonly distributed across many devices, and traditional machine learning requires transferring these data to a central location exposing data to security and privacy risks. Federated Learning (FL) avoids this transfer by training a model without requiring the clients/devices to share their local data; however, FL performance drops when data are not Independent and Identically Distributed (non-IID), such as when label distribution or data size vary across clients. Although techniques for non-IID data have been proposed primarily in the image domain, the sensitivity of various deep learning models to non-IID data needs to be examined. Consequently, this paper investigates the sensitivity of three dominant techniques in sentiment analysis, feed-forward neural networks, LSTMs, and transformers to common types of non-IID data, specifically data size and label imbalances. The scenarios were designed with increasing degrees of imbalance in terms of data size and label distribution to investigate gradual changes. The results revealed that label imbalance has a higher impact on accuracy than data size imbalance irrelevant of the algorithm. Overall, the transformer achieved the highest accuracy, and, while all models experienced a drop in accuracy with the increased label imbalance, this drop was smaller for the transformer, making it well suited for non-IID data.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Included in
Artificial Intelligence and Robotics Commons, Computer Engineering Commons, Data Science Commons, Electrical and Computer Engineering Commons