New logarithmic step size for stochastic gradient descent
Mahsa Soheil SHAMAEE, Sajad Fathi HAFSHEJANI, Zeinab SAEIDIAN
New logarithmic step size for stochastic gradient descent
In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the FashionMinst, CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by 0.9% for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.
stochastic gradient descent / logarithmic step size / warm restart technique
Mahsa Soheil Shamaee received her BSc in Applied Mathematics from Alzahra University, Iran in 2009 and her MSc and PhD from Amirkabir University of Technology (Tehran Polytechnic), Iran all in the field of Computer Science in 2012 and 2018, respectively. Dr. Shamaee has been an Assistant Professor of Computer Science Department, University of Kashan, Iran since 2020. Her research interests include soft computing, reinforcement learning, wireless networks, multi-agent systems, and machine learning
Sajad Fathi Hafshejani received a PhD degree in Mathematics from Shiraz University of Technology, Iran and currently serves as a PIMs postdoctoral fellow at the University of Lethbridge, Canada. His primary research interests encompass convex and non-convex optimization problems, machine learning, quantum computing, and interior point methods
Zeinab Saeidian received her BSc and MSc in Applied Mathematics from University of Tehran, Iran in 2008 and 2010, respectively. Also, she received her PhD from K. N. Toosi University of Technology, Iran in the field of Applied Mathematics-Optimization in 2015. Dr. Saeidian has been an Assistant Professor of Applied Mathematics Department, University of Kashan, Iran since 2017. Her research interests include Nonlinear Optimization, Combinatorial optimization, Network Flows, and Machine Learning
[1] |
Robbins H, Monro S . A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22( 3): 400–407
|
[2] |
Krizhevsky A, Sutskever I, Hinton G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60( 6): 84–90
|
[3] |
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Toronto: University of Toronto, Department of Computer Science, 2009
|
[4] |
Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6517−6525
|
[5] |
Zhang J, Zong C . Deep neural networks in machine translation: an overview. IEEE Intelligent Systems, 2015, 30( 5): 16–25
|
[6] |
Mishra P, Sarawadekar K. Polynomial learning rate policy with warm restart for deep neural network. In: Proceedings of 2019 IEEE Region 10 Conference (TENCON). 2019, 2087−2092
|
[7] |
Vaswani S, Mishkin A, Laradji I, Schmidt M, Gidel G, Lacoste-Julien S. Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 335
|
[8] |
Gower R M, Loizou N, Qian X, Sailanbayev A, Shulgin E, Richtárik P. SGD: General analysis and improved rates. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5200−5209
|
[9] |
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2261–2269
|
[10] |
Smith L N. Cyclical learning rates for training neural networks. In: Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 2017, 464−472
|
[11] |
Loshchilov I, Hutter F. SGDR: Stochastic gradient descent with warm restarts. In: Proceedings of the 5th International Conference on Learning Representations. 2017
|
[12] |
Vrbančič G, Podgorelec V . Efficient ensemble for image-based identification of pneumonia utilizing deep CNN and SGD with warm restarts. Expert Systems with Applications, 2022, 187: 115834
|
[13] |
Xu G, Cao H, Dong Y, Yue C, Zou Y. Stochastic gradient descent with step cosine warm restarts for pathological lymph node image classification via PET/CT images. In: Proceedings of the 5th IEEE International Conference on Signal and Image Processing (ICSIP). 2020, 490−493
|
[14] |
Nemirovski A, Juditsky A, Lan G, Shapiro A . Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009, 19( 4): 1574–1609
|
[15] |
Ghadimi S, Lan G H . Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 2013, 23( 4): 2341–2368
|
[16] |
Bach F, Moulines E. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2011, 451−459
|
[17] |
Rakhlin A, Shamir O, Sridharan K. Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, 2011
|
[18] |
Li X, Zhuang Z, Orabona F. A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 6553−6564
|
[19] |
Ge R, Kakade S M, Kidambi R, Netrapalli P. The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 1341
|
[20] |
Wang X, Magnússon S, Johansson M. On the convergence of step decay step-size for stochastic optimization. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 2021, 14226−14238
|
[21] |
Nocedal J, Wright S J. Numerical Optimization. New York: Springer, 1999
|
[22] |
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778
|
[23] |
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
|
[24] |
Aybat N S, Fallah A, Gurbuzbalaban M, Ozdaglar A. A universally optimal multistage accelerated stochastic gradient method. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 765
|
[25] |
Thomas G B, Finney R L, Weir M D, Giordano F R. Thomas’ Calculus, Early Transcendentals. 10th ed. Boston: Addison Wesley, 2002
|
/
〈 | 〉 |