Gradient Convergence of Deep Learning-Based Numerical Methods for BSDEs
Zixuan Wang , Shanjian Tang
Chinese Annals of Mathematics, Series B ›› 2021, Vol. 42 ›› Issue (2) : 199 -216.
Gradient Convergence of Deep Learning-Based Numerical Methods for BSDEs
The authors prove the gradient convergence of the deep learning-based numerical method for high dimensional parabolic partial differential equations and backward stochastic differential equations, which is based on time discretization of stochastic differential equations (SDEs for short) and the stochastic approximation method for nonconvex stochastic programming problem. They take the stochastic gradient decent method, quadratic loss function, and sigmoid activation function in the setting of the neural network. Combining classical techniques of randomized stochastic gradients, Euler scheme for SDEs, and convergence of neural networks, they obtain the $O(K^{\frac{1}{4}})$ rate of gradient convergence with K being the total number of iterative steps.
PDEs / BSDEs / Deep learning / Nonconvex stochastic programming / Convergence result
| [1] |
Beck, C., Becker, S., Grohs, P., et al., Solving stochastic differential equations and Kolmogorov equations by means of deep learning. arXiv: 1806.00421, 2018 |
| [2] |
|
| [3] |
|
| [4] |
Carreira-Perpinan, M. and Wang, W., Distributed optimization of deeply nested systems, Appearing in Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. JMLR: W&CP volume 33. |
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
E. W., Ma, C. and Wu, L., A priori estimates of the generalization error for two-layer neural networks. arXiv:1810.06397, 2018 |
| [10] |
|
| [11] |
|
| [12] |
Han, J. and Long, J., Convergence of the deep BSDE method for coupled FBSDEs. arXiv: 1811.01165, 2018 |
| [13] |
Han, J. and E, W., Deep learning approximation for stochastic control problems. arXiv: 1611.07422, 2016 |
| [14] |
|
| [15] |
Ithapu, V. K., Ravi, S. N. and Singh, V., On the interplay of network structure and gradient convergence in deep learning, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), IEEE, 2016, 488–495 |
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
Rudd, K., Solving Partial Differential Equations Using Artificial Neural Networks, Ph.D. Thesis, Duke University, 2013. |
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
Zeng, J., Ouyang, S., Lau, T. T. K., et al., Global convergence in deep learning with variable splitting via the Kurdyka-łojasiewicz property. arXiv: 1803.00225, 2018 |
| [28] |
|
| [29] |
Zou, D., Cao, Y., Zhou, D. and Gu, Q., Stochastic gradient descent optimizes over-parameterized deep ReLU networks. arXiv: 1811.08888, 2018 |
/
| 〈 |
|
〉 |