New logarithmic step size for stochastic gradient descent

Mahsa Soheil SHAMAEE , Sajad Fathi HAFSHEJANI , Zeinab SAEIDIAN

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191301

PDF (5696KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (1) : 191301 DOI: 10.1007/s11704-023-3245-z
Artificial Intelligence
RESEARCH ARTICLE

New logarithmic step size for stochastic gradient descent

Author information +
History +
PDF (5696KB)

Abstract

In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an O(1T) convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the FashionMinst, CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by 0.9% for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.

Graphical abstract

Keywords

stochastic gradient descent / logarithmic step size / warm restart technique

Cite this article

Download citation ▾
Mahsa Soheil SHAMAEE, Sajad Fathi HAFSHEJANI, Zeinab SAEIDIAN. New logarithmic step size for stochastic gradient descent. Front. Comput. Sci., 2025, 19(1): 191301 DOI:10.1007/s11704-023-3245-z

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Robbins H, Monro S . A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22( 3): 400–407

[2]

Krizhevsky A, Sutskever I, Hinton G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60( 6): 84–90

[3]

Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Toronto: University of Toronto, Department of Computer Science, 2009

[4]

Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6517−6525

[5]

Zhang J, Zong C . Deep neural networks in machine translation: an overview. IEEE Intelligent Systems, 2015, 30( 5): 16–25

[6]

Mishra P, Sarawadekar K. Polynomial learning rate policy with warm restart for deep neural network. In: Proceedings of 2019 IEEE Region 10 Conference (TENCON). 2019, 2087−2092

[7]

Vaswani S, Mishkin A, Laradji I, Schmidt M, Gidel G, Lacoste-Julien S. Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 335

[8]

Gower R M, Loizou N, Qian X, Sailanbayev A, Shulgin E, Richtárik P. SGD: General analysis and improved rates. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5200−5209

[9]

Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2261–2269

[10]

Smith L N. Cyclical learning rates for training neural networks. In: Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 2017, 464−472

[11]

Loshchilov I, Hutter F. SGDR: Stochastic gradient descent with warm restarts. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[12]

Vrbančič G, Podgorelec V . Efficient ensemble for image-based identification of pneumonia utilizing deep CNN and SGD with warm restarts. Expert Systems with Applications, 2022, 187: 115834

[13]

Xu G, Cao H, Dong Y, Yue C, Zou Y. Stochastic gradient descent with step cosine warm restarts for pathological lymph node image classification via PET/CT images. In: Proceedings of the 5th IEEE International Conference on Signal and Image Processing (ICSIP). 2020, 490−493

[14]

Nemirovski A, Juditsky A, Lan G, Shapiro A . Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009, 19( 4): 1574–1609

[15]

Ghadimi S, Lan G H . Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 2013, 23( 4): 2341–2368

[16]

Bach F, Moulines E. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2011, 451−459

[17]

Rakhlin A, Shamir O, Sridharan K. Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, 2011

[18]

Li X, Zhuang Z, Orabona F. A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 6553−6564

[19]

Ge R, Kakade S M, Kidambi R, Netrapalli P. The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 1341

[20]

Wang X, Magnússon S, Johansson M. On the convergence of step decay step-size for stochastic optimization. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 2021, 14226−14238

[21]

Nocedal J, Wright S J. Numerical Optimization. New York: Springer, 1999

[22]

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778

[23]

Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[24]

Aybat N S, Fallah A, Gurbuzbalaban M, Ozdaglar A. A universally optimal multistage accelerated stochastic gradient method. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 765

[25]

Thomas G B, Finney R L, Weir M D, Giordano F R. Thomas’ Calculus, Early Transcendentals. 10th ed. Boston: Addison Wesley, 2002

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (5696KB)

Supplementary files

FCS-23245-OF-MSS_suppl_1

1896

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/