New logarithmic step size for stochastic gradient descent

Mahsa Soheil SHAMAEE; Sajad Fathi HAFSHEJANI; Zeinab SAEIDIAN

doi:10.1007/s11704-023-3245-z

PDF(5696 KB)

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191301. DOI: 10.1007/s11704-023-3245-z

Artificial Intelligence

RESEARCH ARTICLE

New logarithmic step size for stochastic gradient descent

Author information +

History +

Abstract

In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an $O (\frac{1}{\sqrt{T}})$ convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the FashionMinst, CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by 0.9% for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.

Graphical abstract

Keywords

stochastic gradient descent / logarithmic step size / warm restart technique

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Mahsa Soheil SHAMAEE, Sajad Fathi HAFSHEJANI, Zeinab SAEIDIAN. New logarithmic step size for stochastic gradient descent. Front. Comput. Sci., 2025, 19(1): 191301 https://doi.org/10.1007/s11704-023-3245-z

This is a preview of subscription content, contact us for subscripton.

Mahsa Soheil Shamaee received her BSc in Applied Mathematics from Alzahra University, Iran in 2009 and her MSc and PhD from Amirkabir University of Technology (Tehran Polytechnic), Iran all in the field of Computer Science in 2012 and 2018, respectively. Dr. Shamaee has been an Assistant Professor of Computer Science Department, University of Kashan, Iran since 2020. Her research interests include soft computing, reinforcement learning, wireless networks, multi-agent systems, and machine learning

Sajad Fathi Hafshejani received a PhD degree in Mathematics from Shiraz University of Technology, Iran and currently serves as a PIMs postdoctoral fellow at the University of Lethbridge, Canada. His primary research interests encompass convex and non-convex optimization problems, machine learning, quantum computing, and interior point methods

Zeinab Saeidian received her BSc and MSc in Applied Mathematics from University of Tehran, Iran in 2008 and 2010, respectively. Also, she received her PhD from K. N. Toosi University of Technology, Iran in the field of Applied Mathematics-Optimization in 2015. Dr. Saeidian has been an Assistant Professor of Applied Mathematics Department, University of Kashan, Iran since 2017. Her research interests include Nonlinear Optimization, Combinatorial optimization, Network Flows, and Machine Learning

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Robbins H, Monro S . A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22( 3): 400–407

[2]	Krizhevsky A, Sutskever I, Hinton G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60( 6): 84–90

[3]	Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Toronto: University of Toronto, Department of Computer Science, 2009

[4]	Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6517−6525

[5]	Zhang J, Zong C . Deep neural networks in machine translation: an overview. IEEE Intelligent Systems, 2015, 30( 5): 16–25

[6]	Mishra P, Sarawadekar K. Polynomial learning rate policy with warm restart for deep neural network. In: Proceedings of 2019 IEEE Region 10 Conference (TENCON). 2019, 2087−2092

[7]	Vaswani S, Mishkin A, Laradji I, Schmidt M, Gidel G, Lacoste-Julien S. Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 335

[8]	Gower R M, Loizou N, Qian X, Sailanbayev A, Shulgin E, Richtárik P. SGD: General analysis and improved rates. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5200−5209

[9]	Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2261–2269

[10]	Smith L N. Cyclical learning rates for training neural networks. In: Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 2017, 464−472

[11]	Loshchilov I, Hutter F. SGDR: Stochastic gradient descent with warm restarts. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[12]	Vrbančič G, Podgorelec V . Efficient ensemble for image-based identification of pneumonia utilizing deep CNN and SGD with warm restarts. Expert Systems with Applications, 2022, 187: 115834

[13]	Xu G, Cao H, Dong Y, Yue C, Zou Y. Stochastic gradient descent with step cosine warm restarts for pathological lymph node image classification via PET/CT images. In: Proceedings of the 5th IEEE International Conference on Signal and Image Processing (ICSIP). 2020, 490−493

[14]	Nemirovski A, Juditsky A, Lan G, Shapiro A . Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009, 19( 4): 1574–1609

[15]	Ghadimi S, Lan G H . Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 2013, 23( 4): 2341–2368

[16]	Bach F, Moulines E. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2011, 451−459

[17]	Rakhlin A, Shamir O, Sridharan K. Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, 2011

[18]	Li X, Zhuang Z, Orabona F. A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 6553−6564

[19]	Ge R, Kakade S M, Kidambi R, Netrapalli P. The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 1341

[20]	Wang X, Magnússon S, Johansson M. On the convergence of step decay step-size for stochastic optimization. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 2021, 14226−14238

[21]	Nocedal J, Wright S J. Numerical Optimization. New York: Springer, 1999

[22]	He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778

[23]	Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[24]	Aybat N S, Fallah A, Gurbuzbalaban M, Ozdaglar A. A universally optimal multistage accelerated stochastic gradient method. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 765

[25]	Thomas G B, Finney R L, Weir M D, Giordano F R. Thomas’ Calculus, Early Transcendentals. 10th ed. Boston: Addison Wesley, 2002

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their constructive comments. The research of the first author is partially supported by the University of Kashan (1143902/2).