Electrical and Computer Engineering Publications

Document Type

Article

Publication Date

2024

Journal

IEEE Access

First Page

1

URL with Digital Object Identifier

10.1109/ACCESS.2024.3453068

Last Page

13

Abstract

In sentiment analysis, data are commonly distributed across many devices, and traditional machine learning requires transferring these data to a central location exposing data to security and privacy risks. Federated Learning (FL) avoids this transfer by training a model without requiring the clients/devices to share their local data; however, FL performance drops when data are not Independent and Identically Distributed (non-IID), such as when label distribution or data size vary across clients. Although techniques for non-IID data have been proposed primarily in the image domain, the sensitivity of various deep learning models to non-IID data needs to be examined. Consequently, this paper investigates the sensitivity of three dominant techniques in sentiment analysis, feed-forward neural networks, LSTMs, and transformers to common types of non-IID data, specifically data size and label imbalances. The scenarios were designed with increasing degrees of imbalance in terms of data size and label distribution to investigate gradual changes. The results revealed that label imbalance has a higher impact on accuracy than data size imbalance irrelevant of the algorithm. Overall, the transformer achieved the highest accuracy, and, while all models experienced a drop in accuracy with the increased label imbalance, this drop was smaller for the transformer, making it well suited for non-IID data.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

COinS