Global Solutions to Nonconvex Problems by Evolution of Hamilton-Jacobi PDEs

Howard Heaton, Samy Wu Fung, Stanley Osher

Communications on Applied Mathematics and Computation ›› 2023, Vol. 6 ›› Issue (2) : 790-810. DOI: 10.1007/s42967-022-00239-5
Original Paper

Global Solutions to Nonconvex Problems by Evolution of Hamilton-Jacobi PDEs

Author information +
History +

Abstract

Computing tasks may often be posed as optimization problems. The objective functions for real-world scenarios are often nonconvex and/or nondifferentiable. State-of-the-art methods for solving these problems typically only guarantee convergence to local minima. This work presents Hamilton-Jacobi-based Moreau adaptive descent (HJ-MAD), a zero-order algorithm with guaranteed convergence to global minima, assuming continuity of the objective function. The core idea is to compute gradients of the Moreau envelope of the objective (which is “piece-wise convex”) with adaptive smoothing parameters. Gradients of the Moreau envelope (i.e., proximal operators) are approximated via the Hopf-Lax formula for the viscous Hamilton-Jacobi equation. Our numerical examples illustrate global convergence.

Keywords

Global optimization / Moreau envelope / Hamilton-Jacobi / Hopf-Lax / Cole-Hopf / Proximals / Zero-order optimization

Cite this article

Download citation ▾
Howard Heaton, Samy Wu Fung, Stanley Osher. Global Solutions to Nonconvex Problems by Evolution of Hamilton-Jacobi PDEs. Communications on Applied Mathematics and Computation, 2023, 6(2): 790‒810 https://doi.org/10.1007/s42967-022-00239-5

References

[1.]
Almeida LB. Diederich J. A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. Artificial Neural Networks: Concept Learning, 1990 Los Alamitos IEEE Press 102-111,
CrossRef Google scholar
[2.]
Bauschke HH, Combettes PL. . Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2017 2 New York Springer,
CrossRef Google scholar
[3.]
Beck A. . First-Order Methods in Optimization, 2017 Philadelphia SIAM,
CrossRef Google scholar
[4.]
Berahas AS, Byrd RH, Nocedal J. Derivative-free optimization of noisy functions via quasi-Newton methods. SIAM J. Optim., 2019, 29(2): 965-993,
CrossRef Google scholar
[5.]
Bisong E. Google colaboratory. Building Machine Learning and Deep Learning Models on Google Cloud Platform, 2019 New York Springer 59-64,
CrossRef Google scholar
[6.]
Brooks SH. A discussion of random methods for seeking maxima. Oper. Res., 1958, 6(2): 244-251,
CrossRef Google scholar
[7.]
Cai H, Lou Y, McKenzie D, Yin W. A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization. International Conference on Machine Learning, 2021 PMLR 1193-1203
[8.]
Cai H, Mckenzie D, Yin W, Zhang Z. A one-bit, comparison-based gradient estimator. Appl. Comput. Harmon. Anal., 2022, 60: 242-266,
CrossRef Google scholar
[9.]
Cai H, Mckenzie D, Yin W, Zhang Z. Zeroth-order regularized optimization (zoro): approximately sparse gradients and adaptive sampling. SIAM J. Optim., 2022, 32(2): 687-714,
CrossRef Google scholar
[10.]
Chaudhari P, Choromanska A, Soatto S, LeCun Y, Baldassi C, Borgs C, Chayes J, Sagun L, Zecchina R. Entropy-SGD: biasing gradient descent into wide valleys. J. Stat. Mech, 2019, 2019(12),
CrossRef Google scholar
[11.]
Chaudhari P, Oberman A, Osher S, Soatto S, Carlier G. Deep relaxation: partial differential equations for optimizing deep neural networks. Res. Math. Sci., 2018, 5(3): 1-30,
CrossRef Google scholar
[12.]
Crandall MG, Lions P-L. Two approximations of solutions of Hamilton-Jacobi equations. Math. Comput., 1984, 43(167): 1-19,
CrossRef Google scholar
[13.]
Davis, D., Drusvyatskiy, D.: Stochastic subgradient method converges at the rate o ( k - 1 / 4 ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$o (k^{-1/4})$$\end{document} on weakly convex functions. arXiv:1802.02988 (2018)
[14.]
Davis D, Drusvyatskiy D. Stochastic model-based minimization of weakly convex functions. SIAM J. Optim., 2019, 29(1): 207-239,
CrossRef Google scholar
[15.]
Davis D, Drusvyatskiy D, MacPhee KJ, Paquette C. Subgradient methods for sharp weakly convex functions. J. Optim. Theory Appl., 2018, 179(3): 962-982,
CrossRef Google scholar
[16.]
Davis D, Díaz M, Drusvyatskiy D. Escaping strict saddle points of the Moreau envelope in nonsmooth optimization. SIAM J. Optim., 2022, 32(3): 1958-1983,
CrossRef Google scholar
[17.]
Ermoliev YM, Wets R-B. . Numerical Techniques for Stochastic Optimization, 1988 New York Springer-Verlag,
CrossRef Google scholar
[18.]
Evans LC. Partial differential equations. Graduate Studies in Mathematics, 2010 2 Providence, RI American Mathematical Society
[19.]
Evans LC. Envelopes and nonconvex Hamilton-Jacobi equations. Calc. Var. Partial. Differ. Equ., 2014, 50(1): 257-282,
CrossRef Google scholar
[20.]
Hastie T, Tibshirani R, Friedman JH, Friedman JH. . The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2009 New York Springer,
CrossRef Google scholar
[21.]
Jones DR, Perttunen CD, Stuckman BE. Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl., 1993, 79(1): 157-181,
CrossRef Google scholar
[22.]
Jones, E., Oliphant, T., Peterson, P., et al. SciPy: open source scientific tools for Python. http://www.scipy.org/ (2001)
[23.]
Jourani A, Thibault L, Zagrodny D. Differential properties of the Moreau envelope. J. Funct. Anal., 2014, 266(3): 1185-1237,
CrossRef Google scholar
[24.]
Kim, B., Cai, H., McKenzie, D., Yin, W.: Curvature-aware derivative-free optimization. arXiv:2109.13391 (2021)
[25.]
Kingma , D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR 2015, San Diego, CA (2015)
[26.]
Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science, 1983, 220(4598): 671-680,
CrossRef Google scholar
[27.]
Kozak, D., Becker, S., Doostan, A., Tenorio, L.: Stochastic subspace descent. arXiv:1904.01145 (2019)
[28.]
Kozak D, Becker S, Doostan A, Tenorio L. A stochastic subspace approach to gradient-free optimization in high dimensions. Comput. Optim. Appl., 2021, 79(2): 339-368,
CrossRef Google scholar
[29.]
Kozak, D., Molinari, C., Rosasco, L., Tenorio, L., Villa, S.: Zeroth order optimization with orthogonal random directions. arXiv:2107.03941 (2021)
[30.]
Larson J, Menickelly M, Wild SM. Derivative-free optimization methods. Acta Numer., 2019, 28: 287-404,
CrossRef Google scholar
[31.]
Le Digabel S. Algorithm 909: NOMAD: nonlinear optimization with the mads algorithm. ACM Transact. Math. Softw. (TOMS), 2011, 37(4): 1-15,
CrossRef Google scholar
[32.]
Liu XD, Osher S, Chan T. Weighted essentially non-oscillatory schemes. J. Comput. Phys., 1994, 115(1): 200-212,
CrossRef Google scholar
[33.]
Moré J, Wild S. Benchmarking derivative-free optimization algorithms. SIAM J. Optim., 2009, 20(1): 172-191,
CrossRef Google scholar
[34.]
Moreau JJ. Décomposition orthogonale d’un espace hilbertien selon deux cônes mutuellement polaires. C. R. Hebd. Seances Acad. Sci., 1962, 255: 238-240
[35.]
Moreau JJ. Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France, 1965, 93: 273-299,
CrossRef Google scholar
[36.]
Olson B, Hashmi I, Molloy K, Shehu A. Basin hopping as a general and versatile optimization framework for the characterization of biological macromolecules. Adv. Artif. Intell., 2012, 2012,
CrossRef Google scholar
[37.]
Osher S, Sethian JA. Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys., 1988, 79(1): 12-49,
CrossRef Google scholar
[38.]
Osher S, Shu C-W. High-order essentially nonoscillatory schemes for Hamilton-Jacobi equations. SIAM J. Numer. Anal., 1991, 28(4): 907-922,
CrossRef Google scholar
[39.]
Rastrigin L. The convergence of the random search method in the extremal control of a many parameter system. Automat. Remote Control, 1963, 24: 1337-1342
[40.]
Rockafellar RT. . Convex Analysis, 1970 Princeton Princeton University Press,
CrossRef Google scholar
[41.]
Scaman K, Dos Santos L, Barlier M, Colin I. A simple and efficient smoothing method for faster optimization and local exploration. Adv. Neural. Inf. Process. Syst., 2020, 33: 6503-6513
[42.]
Shi, H.-J. M., Xie,Y., Xuan, M. Q., Nocedal, J.: Adaptive finite-difference interval estimation for noisy derivative-free optimization. arXiv:2110.06380 (2021)
[43.]
Shi, H.-J.M., Xuan, M.Q., Oztoprak, F., Nocedal, J.: On the numerical performance of derivative-free optimization methods based on finite-difference approximations. arXiv:2102.09762 (2021)
[44.]
Storn R, Price K. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim., 1997, 11(4): 341-359,
CrossRef Google scholar
[45.]
Surjanovic, S., Bingham, D.: Virtual library of simulation experiments: test functions and datasets. Retrieved June 23. http://www.sfu.ca/~ssurjano (2021)
[46.]
Wales DJ, Doye JP. Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms. J. Phys. Chem. A, 1997, 101(28): 5111-5116,
CrossRef Google scholar
Funding
U.S. Air Force(MURI FA9550-18-502); Office of Naval Research(N00014-20-1-2787); National Science Foundation(DGE-1650604.)

Accesses

Citations

Detail

Sections
Recommended

/