FedBS: Solving data heterogeneity issue in federated learning using balanced subtasks

Chuxiao Su , Jing Wu , Rui Zhang , Zi Kang , Hui Xia , Cheng Zhang

High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (4) : 100322

PDF
High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (4) :100322 DOI: 10.1016/j.hcc.2025.100322
Research Articles
research-article

FedBS: Solving data heterogeneity issue in federated learning using balanced subtasks

Author information +
History +
PDF

Abstract

Federated learning has emerged as a popular paradigm for distributed machine learning, enabling participants to collaborate on model training while preserving local data privacy. However, a key challenge in deploying federated learning in real-world applications arises from the substantial heterogeneity in local data distributions across participants. These differences can have negative consequences, such as degraded performance of aggregated models. To address this issue, we propose a novel approach that advocates decomposing the skewed original task into a series of relatively balanced subtasks. Decomposing the task allows us to derive unbiased features extractors for the subtasks, which are then utilized to solve the original task. Based on this concept, we have developed the FedBS algorithm. Through comparative experiments on various datasets, we have demonstrated that FedBS outperforms traditional federated learning algorithms such as FedAvg and FedProx in terms of accuracy, convergence speed, and robustness. The main reason behind these improvements is that FedBS addresses the data heterogeneity problem in federated learning by decomposing the original task into smaller, more balanced subtasks, thereby more effectively mitigating imbalances during model training.

Keywords

Distributed machine learning / Federated learning / Data heterogeneity / Non-IID data / Unbalanced data

Cite this article

Download citation ▾
Chuxiao Su, Jing Wu, Rui Zhang, Zi Kang, Hui Xia, Cheng Zhang. FedBS: Solving data heterogeneity issue in federated learning using balanced subtasks. High-Confidence Computing, 2025, 5(4): 100322 DOI:10.1016/j.hcc.2025.100322

登录浏览全文

4963

注册一个新账户 忘记密码

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is supported by the National Key Research and Development Program of China (2024YFB3311802), the National Natural Science Foundation of China (NSFC) (62172377), the Tai-shan Scholars Program of Shandong province (tsqn202312102), and the Startup Research Foundation for Distinguished Scholars (202112016).

References

[1]

S. AbdulRahman, H. Tout, H. Ould-Slimane, A. Mourad, C. Talhi, M. Guizani, A survey on federated learning: The journey from centralized to distributed on-site learning and beyond, IEEE Internet Things J. 8 (7) (2020) 5476-5497.

[2]

P. Voigt, A. Von dem Bussche, The eu general data protection regulation (gdpr), in: A Practical Guide, 1st Ed., vol. 10, no. 3152676, Springer International Publishing, Cham, 2017, pp. 10-5555.

[3]

Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, Y. Li, X. Liu, B. He, A survey on federated learning systems: Vision, hype and reality for data privacy and protection, IEEE Trans. Knowl. Data Eng. 35 (4) (2021) 3347-3366.

[4]

J. Hu-Bolz, M. Reed, K. Zhang, Z. Liu, J. Hu, Federated data acquisition market: Architecture and a mean-field based data pricing strategy, High-Confid. Comput. 5 (1) (2025) 100232.

[5]

Z. Xiong, Z. Cai, D. Takabi, W. Li, Privacy threat and defense for federated learning with non-iid data in AIoT, IEEE Trans. Ind. Inform. 18 (2) (2021) 1310-1321.

[6]

D. Yu, H. Zhang, Y. Huang, Z. Xie, Data distribution inference attack in federated learning via reinforcement learning support, High-Confid. Comput. 5 (1) (2025) 100235.

[7]

B. McMahan, E. Moore, B. Aguera, Communication-efficient learning of deep networks from decentralized data, in:Artificial Intelligence and Statistics, 2017, pp. 1273-1282.

[8]

Q. Li, Y. Diao, Q. Chen, B. He, Federated learning on non-iid data silos: An experimental study, in: 2022 IEEE 38th International Conference on Data Engineering, ICDE, 2022, pp. 965-978.

[9]

M. Luo, F. Chen, D. Hu, Y. Zhang, J. Liang, J. Feng, No fear of heterogeneity: Classifier calibration for federated learning with non-iid data, Adv. Neural Inf. Process. Syst. 34 (2021) 5972-5984.

[10]

T. Li, A.K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, V. Smith, Federated optimization in heterogeneous networks, Proc. Mach. Learn. Syst. 2 (2020) 429-450.

[11]

Q. Li, B. He, D. Song, Model-contrastive federated learning, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10713-10722.

[12]

X. Mu, Y. Shen, K. Cheng, X. Geng, J. Fu, T. Zhang, Z. Zhang, Fedproc: Prototypical contrastive federated learning on non-iid data, Future Gener. Comput. Syst. 143 (2023) 93-104.

[13]

H.-Y. Chen, W.-L. Chao, On bridging generic and personalized federated learning for image classification, in:International Conference on Learning Representations, 2022.

[14]

W. Hao, M. El-Khamy, J. Lee, J. Zhang, K.J. Liang, C. Chen, L.C. Duke, Towards fair federated learning with zero-shot data augmentation, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3310-3319.

[15]

Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, V. Chandra, Federated learning with non-iid data, 2018, arXiv preprint arXiv:1806.00582.

[16]

Z. He, L. Wang, Z. Cai, Clustered federated learning with adaptive local differential privacy on heterogeneous iot data, IEEE Internet Things J. 11 (1) (2023) 137-146.

[17]

Z. Xiong, W. Li, Z. Cai, Federated generative model on multi-source heterogeneous data in IoT, in:Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, 2023, pp. 10537-10545.

[18]

T. Lin, L. Kong, S.U. Stich, M. Jaggi, Ensemble distillation for robust model fusion in federated learning, Adv. Neural Inf. Process. Syst. 33 (2020) 2351-2363.

[19]

C. Wang, H. Xia, S. Xu, H. Chi, R. Zhang, C. Hu, FedBnR: Mitigating federated learning non-IID problem by breaking the skewed task and reconstructing representation, Future Gener. Comput. Syst. 153 (2024) 1-11.

[20]

N. Dvornik, C. Schmid, J. Mairal, Selecting relevant features from a multi-domain representation for few-shot classification, in: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part X 16, 2020, pp. 769-786.

[21]

H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, Mixup: Beyond empirical risk minimization, 2017, arXiv preprint arXiv:1710.09412.

[22]

P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open problems in federated learning, Found. Trends® Mach. Learn. 14 (1-2) (2021) 1-210.

[23]

M. Mendieta, T. Yang, P. Wang, M. Lee, Z. Ding, C. Chen, Local learning matters: Rethinking data heterogeneity in federated learning,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8397-8406.

[24]

S.P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, A.T. Suresh, Scaffold: Stochastic controlled averaging for federated learning,in:International Conference on Machine Learning, 2020, pp. 5132-5143.

[25]

L. Zhang, Z. Deng, K. Kawaguchi, A. Ghorbani, J. Zou, How does mixup help with robustness and generalization?, 2020, arXiv preprint arXiv: 2010.04819.

[26]

X. Li, K. Huang, W. Yang, S. Wang, Z. Zhang, On the convergence of fedavg on non-iid data, 2019, arXiv preprint arXiv:1907.02189.

[27]

W. Huang, M. Ye, B. Du, Learn from others and be yourself in heterogeneous federated learning, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10143-10153.

[28]

L. Collins, H. Hassani, A. Mokhtari, S. Shakkottai, Exploiting shared representations for personalized federated learning, in:International Conference on Machine Learning, 2021, pp. 2089-2099.

[29]

T. Yoon, S. Shin, S.J. Hwang, E. Yang, Fedmix: Approximation of mixup under mean augmented federated learning, 2021, arXiv preprint arXiv: 2107.00233.

[30]

A. Krizhevsky, G. Hinton, et al., Learning multiple layers of features from tiny images, 2009.

[31]

L.N. Darlow, E.J. Crowley, A. Antoniou, A.J. Storkey, Cinic-10 is not imagenet or cifar-10, 2018, arXiv preprint arXiv:1810.03505.

[32]

D.A.E. Acar, V. Saligrama, Federated learning based on dynamic regularization, 2021, arXiv preprint arXiv:2111.04263.

AI Summary AI Mindmap
PDF

148

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/