Accelerating decentralized federated learning via momentum GD with heterogeneous delays

Na Li; Hangguan Shan; Meiyan Song; Yong Zhou; Zhongyuan Zhao; Howard H. Yang; Fen Hou

doi:10.1016/j.hcc.2025.100310

High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (4) :100310 DOI: 10.1016/j.hcc.2025.100310

Research Articles

research-article

Accelerating decentralized federated learning via momentum GD with heterogeneous delays

Author information +

History +

PDF

Abstract

Federated learning (FL) with synchronous model aggregation suffers from the straggler issue because of heterogeneous transmission and computation delays among different agents. In mobile wireless networks, this issue is exacerbated by time-varying network topology due to agent mobility. Although asynchronous FL can alleviate straggler issues, it still faces critical challenges in terms of algorithm design and convergence analysis because of dynamic information update delay (IU-Delay) and dynamic network topology. To tackle these challenges, we propose a decentralized FL framework based on gradient descent with momentum, named decentralized momentum federated learning (DMFL). We prove that DMFL is globally convergent on convex loss functions under the bounded time-varying IU-Delay, as long as the network topology is uniformly jointly strongly connected. Moreover, DMFL does not impose any restrictions on the data distribution over agents. Extensive experiments are conducted to verify DMFL’s performance superiority over the benchmarks and to reveal the effects of diverse parameters on the performance of the proposed algorithm.

Keywords

Decentralized federated learning / Gradient descent / Momentum / Information update delay / Convergence

Cite this article

Download citation ▾

Na Li, Hangguan Shan, Meiyan Song, Yong Zhou, Zhongyuan Zhao, Howard H. Yang, Fen Hou. Accelerating decentralized federated learning via momentum GD with heterogeneous delays. High-Confidence Computing, 2025, 5(4): 100310 DOI:10.1016/j.hcc.2025.100310

登录浏览全文

4963

注册一个新账户忘记密码

CRediT authorship contribution statement

Na Li: Writing - original draft, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Hangguan Shan: Writing - review & editing, Resources, Methodology, Funding acquisition, Conceptualization. Meiyan Song: Formal analysis, Data curation. Yong Zhou: Writing - review & editing, Methodology, Formal analysis, Conceptualization. Zhongyuan Zhao: Writing - review & editing, Methodology, Investigation. Howard H. Yang: Writing - review & editing, Conceptualization. Fen Hou: Writing - review & editing, Investigation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work was supported in part by the National Natural Science Foundation of China under Grants (U21B2029 and U21A 20456), in part by the Zhejiang Provincial Natural Science Foundation of China under Grant (LR23F010006), in part by the open research project program funded by the Science and Technology Development Fund (SKL-IOTSC(UM)-2024-2026), in part by the State Key Laboratory of Internet of Things for Smart City (University of Macau) (SKL-IoTSC(UM)-2024-026/ORP/GA01/2023), and in part by a Nokia donation project.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zhibin Wang, et al., Interference management for over-the-air federated learning in multi-cell wireless networks, IEEE J. Sel. Areas Commun. 40 (8) (2022) 2361-2377.

[2]	Enrique Tomás Martínez Beltrán, et al., Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges, IEEE Commun. Surv. Tutorials (2023).

[3]	Chenghao Hu, Jingyan Jiang, Zhi Wang, Decentralized federated learning: A segmented gossip approach, 2019, arXiv preprint arXiv:1908.07782.

[4]	Xiangru Lian, et al., Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent, Adv. Neural Inf. Process. Syst. 30 (2017).

[5]	Na Lee, et al., Decentralized federated learning under communication delays, in: 2022 IEEE International Conference on Sensing, Communication, and Networking, SECON Workshops, IEEE, 2022.

[6]	Robert Hannah, Fei Feng, Wotao Yin,A2BCD: Asynchronous acceleration with optimal complexity, Int. Conf. Learn. Represent. (2019).

[7]	Qi Liu, et al., Asynchronous decentralized federated learning for collabora-tive fault diagnosis of PV stations, IEEE Trans. Netw. Sci. Eng. 9 (3) (2022) 1680-1696.

[8]	Xiangru Lian, et al., Asynchronous decentralized parallel stochastic gradient descent,in:International Conference on Machine Learning, PMLR, 2018.

[9]	Jiaqi Zhang, Keyou You, AsySPA: An exact asynchronous algorithm for convex optimization over digraphs, IEEE Trans. Autom. Control 65 (6) (2019) 2494-2509.

[10]	Anusha Lalitha, et al., Decentralized Bayesian learning over graphs, 2019, arXiv preprint arXiv:1905.10466.

[11]	Hao Yu, Rong Jin, Sen Yang, On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization, in:International Conference on Machine Learning, PMLR, 2019.

[12]	Zhibin Wang, et al., Federated learning via intelligent reflecting surface, IEEE Trans. Wirel. Commun. 21 (2) (2021) 808-822.

[13]	Mei Cao, et al., C2S: Class-aware client selection for effective aggregation in federated learning, High-Confid. Comput. 2 (3) (2022) 100068.

[14]	Mincheol Paik, Haneul Ko, Sangheon Pack, Virtual experience-based mobile device selection algorithm for federated learning, IEEE Syst. J. 17 (2) (2022) 2294-2303.

[15]	Zhongyu Wang, et al., Asynchronous federated learning over wireless communication networks, IEEE Trans. Wirel. Commun. 21 (9) (2022) 6961-6978.

[16]	Yong Zeng, et al., Theoretical analysis of impact of delayed updates on decentralized federated learning, 2023, arXiv preprint arXiv:2311.01229.

[17]	Benjamin Sirb, Xiaojing Ye, Consensus optimization with delayed and stochastic gradients on decentralized networks, in: 2016 IEEE International Conference on Big Data, Big Data, IEEE, 2016.

[18]	Tianyu Wu, et al., Decentralized consensus optimization with asynchrony and delays, IEEE Trans. Signal Inf. Process. Netw. 4 (2) (2017) 293-307.

[19]	Wei Liu, et al., Accelerating federated learning via momentum gradient descent, IEEE Trans. Parallel Distrib. Syst. 31 (8) (2020) 1754-1766.

[20]	Chris Godsil, Gordon F. Royle, Algebraic Graph Theory, vol. 207, Springer Science and Business Media, 2001.

[21]	Ian Goodfellow, Deep learning, 2016.

[22]	Yurii Nesterov, Primal-dual subgradient methods for convex problems, Math. Program. 120 (1) (2009) 221-259.

[23]	Pierre-Alexander Bliman, Angelia Nedic, Asuman Ozdaglar, Rate of convergence for consensus with delays, in: 2008 47th IEEE Conference on Decision and Control, IEEE.

[24]	Richard H. Byrd, et al., A stochastic quasi-Newton method for large-scale optimization, SIAM J. Optim. 26 (2) (2016) 1008-1031.

[25]	Qinbin Li, et al., Federated learning on non-IID data silos: An experimental study, in: 2022 IEEE 38th International Conference on Data Engineering, ICDE, IEEE, 2022.

[26]	Duncan J. Watts, Steven H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature 393 (6684) (1998) 440-442.

[27]	Durmus Alp Emre Acar, et al., Federated learning based on dynamic regularization, 2021, arXiv preprint arXiv:2111.04263.

[28]	McMahan H. Brendan, et al., Federated learning of deep networks using model averaging, 2016, arXiv preprint arXiv:1602.05629, 2 (2).

[29]	Wang Luping, Wang Wei, Li Bo, CMFL: Mitigating communication overhead for federated learning, in: 2019 IEEE 39th International Conference on Distributed Computing Systems, ICDCS, IEEE, 2019.