Strong Overall Error Analysis for the Training of Artificial Neural Networks Via Random Initializations
Arnulf Jentzen, Adrian Riekert
Strong Overall Error Analysis for the Training of Artificial Neural Networks Via Random Initializations
Although deep learning-based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view. Recently, estimates for the convergence of the overall error have been obtained in the situation of deep supervised learning, but with an extremely slow rate of convergence. In this note, we partially improve on these estimates. More specifically, we show that the depth of the neural network only needs to increase much slower in order to obtain the same rate of approximation. The results hold in the case of an arbitrary stochastic optimization algorithm with i.i.d. random initializations.
Deep learning / Artificial intelligence / Empirical risk minimization / Optimization
[1.] |
|
[2.] |
|
[3.] |
|
[4.] |
Beck, C., Jentzen, A., Kuckuck, B.: Full error analysis for the training of deep neural networks. Infin. Dimens. Anal. Quantum Probab. Relat. Top. 25 (2022), no. 2, Paper No. 2150020, 76 pp. 0219–0257, 65G99 (68T07), 4440215
|
[5.] |
|
[6.] |
|
[7.] |
|
[8.] |
|
[9.] |
Du, S.S., Zhai, X., Poczós, B., Singh, A.: Gradient descent provably optimizes over-parameterized neural networks (2018). arXiv:1810.02054
|
[10.] |
|
[11.] |
|
[12.] |
|
[13.] |
|
[14.] |
Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations (2018). arXiv:1809.02362 (Accepted in Mem. Am. Math. Soc.)
|
[15.] |
Grohs, P., Hornung, F., Jentzen, A., Zimmermann, P.: Space-time error estimates for deep neural network approximations for differential equations. Adv. Comput. Math. 49 (2023), no. 1, Paper No. 4. 1019-7168, 65M99 (65M15 68T07), 4534487
|
[16.] |
Grohs, P., Jentzen, A., Salimova, D.: Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms. Partial Differ. Equ. Appl. 3 (2022), no. 4, Paper No. 45, 41 pp. 2662–2963, 65C05 (35-04), 4437476
|
[17.] |
|
[18.] |
|
[19.] |
|
[20.] |
|
[21.] |
Jentzen, A., Welti, T.: Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation (2020). arXiv:2003.01291
|
[22.] |
Lu, J., Shen, Z., Yang, H., Zhang, S.: Deep network approximation for smooth functions (2020). arXiv:2001.03040
|
[23.] |
Massart, P.: Concentration Inequalities and Model Selection. Vol. 1896 of Lecture Notes in Mathematics, vol. 1896. Springer, Berlin (2007). (Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23 (2003))
|
[24.] |
|
[25.] |
Shen, Z., Yang, H., Zhang, S.: Deep network approximation characterized by number of neurons (2020). arXiv:1906.05497
|
[26.] |
|
[27.] |
|
[28.] |
|
[29.] |
|
/
〈 | 〉 |