Anderson Acceleration of Gradient Methods with Energy for Optimization Problems
Hailiang Liu, Jia-Hao He, Xuping Tian
Anderson Acceleration of Gradient Methods with Energy for Optimization Problems
Anderson acceleration (AA) is an extrapolation technique designed to speed up fixed-point iterations. For optimization problems, we propose a novel algorithm by combining the AA with the energy adaptive gradient method (AEGD) [arXiv:2010.05109]. The feasibility of our algorithm is ensured in light of the convergence theory for AEGD, though it is not a fixed-point iteration. We provide rigorous convergence rates of AA for gradient descent (GD) by an acceleration factor of the gain at each implementation of AA-GD. Our experimental results show that the proposed AA-AEGD algorithm requires little tuning of hyperparameters and exhibits superior fast convergence.
Anderson acceleration (AA) / Gradient descent (GD) / Energy stability
[1.] |
|
[2.] |
Bertrand Q., Massias M.: Anderson acceleration of coordinate descent. In: International Conference on Artificial Intelligence and Statistics, PMLR, pp. 1288–1296 (2021)
|
[3.] |
|
[4.] |
|
[5.] |
|
[6.] |
|
[7.] |
Eddy, R.P.: Extrapolating to the limit of a vector sequence. In: Wang, P.C.C., Schoenstadt, A.L., Russak, B.I., Comstock, C. (eds.) Information Linkage Between Applied Mathematics and Industry, pp. 387–396. Academic Press, Elsevier Inc. (1979)
|
[8.] |
|
[9.] |
|
[10.] |
|
[11.] |
|
[12.] |
Geist M., Scherrer, B.: Anderson acceleration for reinforcement learning. arXiv:1809.09501 (2018)
|
[13.] |
|
[14.] |
Li, Z., Li, J.: A fast Anderson-Chebyshev acceleration for nonlinear optimization. In: International Conference on Artificial Intelligence and Statistics, PMLR, pp. 1047–1057 (2020)
|
[15.] |
Liu, H. L., Tian, X. P.: AEGD: adaptive gradient descent with energy. arXiv:2010.05109 (2020)
|
[16.] |
|
[17.] |
Liu, H.L., Tian X.P.: Dynamic behavior for a gradient algorithm with energy and momentum. arXiv:2203.12199, 1–20 (2022)
|
[18.] |
Liu, H.L., Tian, X.P: SGEM: stochastic gradient with energy and momentum. arXiv:2208.02208, 1–24 (2022)
|
[19.] |
|
[20.] |
|
[21.] |
Mai, V., Johansson, M.: Anderson acceleration of proximal gradient methods. In: Singh, A.H.D. III (ed.) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 119, pp. 6620–6629. PMLR, Virtual (2020)
|
[22.] |
Nesterov Y.E.: A method for solving the convex programming problem with convergence rate o ( 1 / k 2 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$o(1/k^2)$$\end{document}. Dokl. akad. nauk Sssr. 269, 543–547 (1983)
|
[23.] |
|
[24.] |
Pasini, M.L., Yin, J., Reshniak, V., Stoyanov, M.: Stable Anderson acceleration for deep learning. arXiv:2110.14813 (2021)
|
[25.] |
|
[26.] |
|
[27.] |
|
[28.] |
|
[29.] |
|
[30.] |
Scieur, D., Bach, F., d’Aspremont A.: Nonlinear acceleration of stochastic algorithms. In: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3985–3994. ACM (2017)
|
[31.] |
Scieur, D., d’Aspremont, A.., Bach, F.: Regularized nonlinear acceleration. In: NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 712–720. ACM (2016)
|
[32.] |
|
[33.] |
|
[34.] |
Su, W., Boyd, S., Candes, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Journal of Machine Learning Research 17(1), 5312–5354 (2016)
|
[35.] |
|
[36.] |
|
[37.] |
|
[38.] |
|
[39.] |
|
[40.] |
|
/
〈 | 〉 |