Convergence of Gradient Algorithms for Nonconvex C 1+α Cost Functions
Zixuan Wang , Shanjian Tang
Chinese Annals of Mathematics, Series B ›› 2023, Vol. 44 ›› Issue (3) : 445 -464.
Convergence of Gradient Algorithms for Nonconvex C 1+α Cost Functions
This paper is concerned with convergence of stochastic gradient algorithms with momentum terms in the nonconvex setting. A class of stochastic momentum methods, including stochastic gradient descent, heavy ball and Nesterov’s accelerated gradient, is analyzed in a general framework under mild assumptions. Based on the convergence result of expected gradients, the authors prove the almost sure convergence by a detailed discussion of the effects of momentum and the number of upcrossings. It is worth noting that there are not additional restrictions imposed on the objective function and stepsize. Another improvement over previous results is that the existing Lipschitz condition of the gradient is relaxed into the condition of Hölder continuity. As a byproduct, the authors apply a localization procedure to extend the results to stochastic stepsizes.
Gradient descent methods / Nonconvex optimization / Accelerated gradient descent / Heavy-ball momentum
| [1] |
An, W., Wang, H., Sun, Q., et al., A PID Controller Approach for Stochastic Optimization of Deep Networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018, 8522–8531. |
| [2] |
Becker, S. and Lecun, Y., Improving the Convergence of Back-propagation Learning with Second-order Methods, Proceedings of the 1988 Connectionist Models Summer School, San Mateo, 1988, 29–37. |
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
Chen, X., Liu, S., Sun, R. and Hong, M., On the Convergence of a Class of Adam-type Algorithms for Non-convex Optimization, ICLR 2019, Seventh International Conference on Learning Representations, New Orleans, 2019. |
| [8] |
|
| [9] |
Cutkosky, A. and Orabona, F., Momentum-based Variance Reduction in Non-convex SGD, Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, 2019, 15210–15219. |
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
Ghadimi, E., Feyzmahdavian, H. R. and Johansson, M., Global Convergence of the Heavy-ball Method for Convex Optimization, ROC15, 14th European Control Conference, Linz, 2015, 310–315. |
| [14] |
|
| [15] |
|
| [16] |
Gitman, I., Lang, II., Zhang, P. and Xiao, L., Understanding the Role of Momentum in Stochastic Gradient Methods, Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, 2019, 9633–9643. |
| [17] |
Hoffman, M. D. and Blei, D. M., Stochastic Structured Variational Inference, AISTATS 2015, 18th International Conference on Artificial Intelligence and Statistics, San Diego, 2015, 361–369. |
| [18] |
|
| [19] |
Kingma, D. P. and Ba, J. L., Adam: A Method for Stochastic Optimization, ICLR 2015, Third International Conference on Learning Representations, San Diego, 2015. |
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Li, Q., Tai, C. and B, W., Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms, ICML’17, Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017, 2101–2110. |
| [24] |
|
| [25] |
Li, X. and Orabona, F., On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes, AISTATS 2018, 21st International Conference on Artificial Intelligence and Statistics, Playa Blanca, 2018, 983–992. |
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
Reddi, S. J., Kale, S. and Kumar, S., On the Convergence of Adam and Beyond, ICLR 2018, Sixth International Conference on Learning Representations, Vancouver, 2018. |
| [34] |
Rennie, J. D. M., Smooth hinge classification, http://people.csail.mit.edu/jrennie/writing, Massachusetts Inst. Technol., 2005. |
| [35] |
|
| [36] |
Robbins, H. and Siegmund, D., A convergence theorem for non negative almost supermartingales and some applications, Optimizing Methods in Statistics, 1971, 233–257. |
| [37] |
Sypherd, T., Diaz, M., Sankar, L. and Kairouz, P., A Tunable Loss Function for Binary Classification, 2019 IEEE International Symposium on Information Theory, Paris, 2019, 2479–2483. |
| [38] |
|
| [39] |
|
| [40] |
Yan, Y., Yang, T., Li, Z., et al., A Unified Analysis of Stochastic Momentum Methods for Deep Learning, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCA1 2018), Stockholm, 2018, 2955–2961. |
/
| 〈 |
|
〉 |