Rate distortion optimization for adaptive gradient quantization in federated learning

Guojun Chen , Kaixuan Xie , Wenqiang Luo , Yinfei Xu , Lun Xin , Tiecheng Song , Jing Hu

›› 2024, Vol. 10 ›› Issue (6) : 1813 -1825.

PDF
›› 2024, Vol. 10 ›› Issue (6) :1813 -1825. DOI: 10.1016/j.dcan.2024.01.005
Research article
research-article

Rate distortion optimization for adaptive gradient quantization in federated learning

Author information +
History +
PDF

Abstract

Federated Learning (FL) is an emerging machine learning framework designed to preserve privacy. However, the continuous updating of model parameters over uplink channels with limited throughput leads to a huge communication overload, which is a major challenge for FL. To address this issue, we propose an adaptive gradient quantization approach that enhances communication efficiency. Aiming to minimize the total communication costs, we consider both the correlation of gradients between local clients and the correlation of gradients between communication rounds, namely, in the time and space dimensions. The compression strategy is based on rate distortion theory, which allows us to find an optimal quantization strategy for the gradients. To further reduce the computational complexity, we introduce the Kalman filter into the proposed approach. Finally, numerical results demonstrate the effectiveness and robustness of the proposed rate-distortion optimization adaptive gradient quantization approach in significantly reducing the communication costs when compared to other quantization methods.

Keywords

Federated learning / Communication efficiency / Adaptive quantization / Rate distortion

Cite this article

Download citation ▾
Guojun Chen, Kaixuan Xie, Wenqiang Luo, Yinfei Xu, Lun Xin, Tiecheng Song, Jing Hu. Rate distortion optimization for adaptive gradient quantization in federated learning. , 2024, 10(6): 1813-1825 DOI:10.1016/j.dcan.2024.01.005

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

S. Wiedemann, K.-R. Müller, W. Samek, Compact and computationally efficient rep-resentation of deep neural networks, IEEE Trans. Neural Netw. Learn. Syst. 31 (3)(2020) 772-785.

[2]

F. Sattler, S. Wiedemann, K.-R. Müller, W. Samek, Robust and communication-efficient federated learning from non-i. i.d. data, IEEE Trans. Neural Netw. Learn. Syst. 31 (9) (2020) 3400-3413.

[3]

B. McMahan, H.E. Moore, D. Ramage, S. Hampson, B.A. y Arcas,Communication-efficient learning of deep networks from decentralized data, in:Proc. Artif. Intell. Statist., AISTATS, 2017, pp. 1273-1282.

[4]

B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication-efficient learning of deep networks from decentralized data, in: Artificial Intelligence and Statistics, PMLR, 2017, pp. 1273-1282.

[5]

J. Dean, G.S. Corrado, R. Monga, K. Chen, A.Y. Ng, Large scale distributed deep networks, Proc. Adv. Neural Inf. Process. Sys., 2015, pp. 1223-1231.

[6]

Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: concept and appli-cations, ACM Trans. Intell. Syst. Technol. 10 (2) (2019) 1-19.

[7]

Y. Li, X. Wang, R. Zeng, P.K. Donta, I. Murturi, M. Huang, S. Dustdar,Federated domain generalization: a survey, arXiv preprint, arXiv :2306.01334, 2023.

[8]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., CVPR, 2016, pp. 770-778.

[9]

O.A. Wahab, A. Mourad, H. Otrok, T. Taleb, Federated machine learning: survey, multi-level classification, desirable criteria and future directions in communication and networking systems, IEEE Commun. Surv. Tutor. 23 (2) (2021) 1342-1397.

[10]

T. Li, A.K. Sahu, A. Talwalkar, V. Smith, Federated learning: challenges, methods, and future directions, IEEE Signal Process. Mag. 37 (3) (2020) 50-60.

[11]

S. Elhoushy, M. Ibrahim, W. Hamouda, Cell-free massive MIMO: a survey, IEEE Commun. Surv. Tutor. 24 (1) (2021) 492-523.

[12]

T.T. Vu, D.T. Ngo, N.H. Tran, H.Q. Ngo, M.N. Dao, R.H. Middleton, Cell-free massive MIMO for wireless federated learning, IEEE Trans. Wirel. Commun. 19 (10) (2020) 6377-6392.

[13]

D. Fan, X. Yuan, Y.-J.A. Zhang, Temporal-structure-assisted gradient aggregation for over-the-air federated edge learning, IEEE J. Sel. Areas Commun. 39 (12) (2021) 3757-3771.

[14]

C. Xu, S. Liu, Z. Yang, Y. Huang, K.-K. Wong, Learning rate optimization for fed-erated learning exploiting over-the-air computation, IEEE J. Sel. Areas Commun. 39 (12) (2021) 3742-3756.

[15]

K. Yang, T. Jiang, Y. Shi, Z. Ding, Federated learning via over-the-air computation, IEEE Trans. Wirel. Commun. 19 (3) (2020) 2022-2035.

[16]

J. Mills, J. Hu, G. Min, Client-side optimization strategies for communication-efficient federated learning, IEEE Commun. Mag. 60 (7) (2022) 60-66.

[17]

N. Nguyen-Thanh, P. Ciblat, S. Maleki, V.-T. Nguyen, How many bits should be reported in quantized cooperative spectrum sensing?, IEEE Wirel. Commun. Lett. 4(5) (2015) 465-468.

[18]

D. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnovic,QSGD: communication-efficient SGD via gradient quantization and encoding,in:Proc. Adv. Neural Inf. Process. Sys., 2017, pp. 1709-1720.

[19]

V. Gandikota, D. Kane, R.K. Maity, A. Mazumdar, vqSGD: vector quantized stochas-tic gradient descent, IEEE Trans. Inf. Theory 68 (7) (2022) 4573-4587.

[20]

N. Shlezinger, M. Chen, Y.C. Eldar, H.V. Poor, S. Cui, UVeQFed: universal vector quantization for federated learning, IEEE Trans. Signal Process. 69 (2021) 500-514.

[21]

X. Su, Y. Zhou, L. Cui, J. Liu, On model transmission strategies in federated learn-ing with lossy communications, IEEE Trans. Parallel Distrib. Syst. 34 (4) (2023) 1173-1185.

[22]

L. Cui, X. Su, Y. Zhou, Y. Pan, Slashing communication traffic in federated learning by transmitting clustered model updates, IEEE J. Sel. Areas Commun. 39 (8) (2021) 2572-2589.

[23]

D. Jhunjhunwala, A. Gadhikar, G. Joshi, Y.C. Eldar, Adaptive quantization of model updates for communication-efficient federated learning, in: Proc. of IEEE ICASSP, 2021, pp. 3110-3114.

[24]

G. Yan, S.-L. Huang, T. Lan, L. Song, DQ-SGD: dynamic quantization in SGD for communication-efficient distributed learning, in: IEEE 18th Inter. Conf. on Mob. Ad Hoc and Smart Syst., MASS, 2021, pp. 136-144.

[25]

L. Cui, X. Su, Y. Zhou, J. Liu,Optimal rate adaption in federated learning with com-pressed communications, in: IEEE INFOCOM 2022 - IEEE Conference on Computer Communications, 2022, pp. 1459-1468.

[26]

E. Rizk, S. Vlaski, A.H. Sayed, Dynamic federated learning, in: Proc. IEEE 21st Int. Workshop on Signal Proce. Advances in Wireless Commun., SPAWC, 2020, pp. 1-5.

[27]

S. Vlaski, E. Rizk, A.H. Sayed, Tracking performance of online stochastic learners, IEEE Signal Process. Lett. 27 (2020) 1385-1389.

[28]

M. Soflaei, H. Guo, A. Al-Bashabsheh, Y. Mao, R. Zhang, Aggregated learning: a vector-quantization approach to learning neural network classifiers,in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 5810-5817.

[29]

N. Zhang, M. Tao, J. Wang, Sum-rate-distortion function for indirect multiterminal source coding in federated learning, in: 2021 Proc. IEEE Int. Symp. Inf. Theory, ISIT, 2021, pp. 2161-2166.

[30]

H. Yang, T. Ding, X. Yuan, Federated learning with lossy distributed source coding: analysis and optimization, IEEE Trans. Commun. 71 (8) (2023) 4561-4576.

[31]

H. Viswanathan, T. Berger, The quadratic Gaussian CEO problem, IEEE Trans. Inf. Theory 43 (5) (1997) 1549-1559.

[32]

Y. Oohama, The rate-distortion function for the quadratic Gaussian CEO problem, IEEE Trans. Inf. Theory 44 (3) (1998) 1057-1070.

[33]

J. Chen, X. Zhang, T. Berger, S. Wicker, An upper bound on the sum-rate distortion function and its corresponding rate allocation schemes for the CEO problem, IEEE J. Sel. Areas Commun. 22 (6) (2004) 977-987.

[34]

J. Chen, W. Jia, On the vector Gaussian CEO problem, in: 2011 Proc. IEEE Int. Symp. Inf. Theory, ISIT, 2011, pp. 2050-2054.

[35]

Y. Xu, Q. Wang, Rate region of the vector Gaussian CEO problem with the trace distortion constraint, IEEE Trans. Inf. Theory 62 (4) (2016) 1823-1835.

[36]

V. Kostina, B. Hassibi, The CEO problem with inter-block memory, IEEE Trans. Inf. Theory 67 (12) (2021) 7752-7768.

[37]

V. Kostina, B. Hassibi,Rate-cost tradeoffs in scalar LQG control and tracking with side information, in:2018 Proc. Annual Allerton Conf. on Commun., Control, and Comput., Allerton, 2018, pp. 421-428.

[38]

P.A. Stavrou, T. Charalambous, C.D. Charalambous, Finite-time nonanticipative rate distortion function for time-varying scalar-valued Gauss-Markov sources, IEEE Con-trol Syst. Lett. 2(1) (2018) 175-180.

[39]

Y. Lin, S. Han, H. Mao, Y. Wang, W.J. Dally,Deep gradient compression: reducing the communication bandwidth for distributed training,in:Proc. Int. Conf. Learn. Representations, 2018.

[40]

J. Konečny, H.B. McMahan, F.X. Yu, P. Richtárik, A.T. Suresh, D. Bacon,Federated learning: strategies for improving communication efficiency, arXiv preprint, arXiv: 1610.05492, 2016.

[41]

speedtest. net,Speedtest United States market report, https://www.speedtest.net/reports/united-states/, November 2023. (Accessed 8 January 2024).

[42]

Y. Yang, Z. Zhang, Q. Yang, Communication-efficient federated learning with binary neural networks, IEEE J. Sel. Areas Commun. 39 (12) (2021) 3836-3850.

[43]

A.H. Sayed, Adaptive Filters, John Wiley & Sons, Hoboken, New Jersey, 2008.

[44]

Z.J. Towfic, J. Chen, A.H. Sayed, On distributed online classification in the midst of concept drifts, Neurocomputing 112 (2013) 138-152.

[45]

A. Abdi, F. Fekri, Reducing communication overhead via ceo in distributed training, in: Proc. IEEE 20th Int. Workshop on Signal Proce. Advances in Wireless Commun., SPAWC, 2019, pp. 1-5.

[46]

R. Hönig, Y. Zhao, R. Mullins, DAdaQuant: doubly-adaptive quantization for communication-efficient federated learning,in: Proceedings of the 39th Interna-tional Conference on Machine Learning, vol. 162, PMLR, 2022, pp. 8852-8866.

[47]

P.K. Donta, S.N. Srirama, T. Amgoth, C.S.R. Annavarapu, Survey on recent advances in iot application layer protocols and machine learning scope for research directions, Digit. Commun. Netw. 8(5) (2022) 727-744.

[48]

Y. LeCun,The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist, 1998. (Accessed 8 January 2024).

[49]

A. Krizhevsky, G. Hinto, Learning multiple layers of features from tiny images, ICite-seer, Tech. Rep., 2009.

AI Summary AI Mindmap
PDF

70

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/