ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximation
Shanshan Tang , Bo Li , Haijun Yu
Communications in Mathematics and Statistics ›› : 1 -27.
ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximation
In a previous study by Li et al. (Commun Comput Phys 27(2):379–411, 2020), it is shown that deep neural networks built with rectified power units (RePU) as activation functions can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction, which we call ChebNet. The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series. On the same time, ChebNets are much more stable. Numerical results show that the constructed ChebNets can be further fine-tuned to obtain much better results than those obtained by tuning deep RePU nets constructed by power series approach. As spectral accuracy is hard to obtain by direct training of deep neural networks, ChebNets provide a practical way to obtain spectral accuracy, it is expected to be useful in real applications that require efficient approximations of smooth functions.
| [1] |
|
| [2] |
Bungartz, H.J.: An adaptive Poisson solver using hierarchical bases and sparse grids. In: Iterative Methods in Linear Algebra, Brussels, Belgium, pp. 293–310 (1992) |
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160 (2007) |
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems 29, 3844–3852 (2016) arXiv:1606.09375 |
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., Sainath, T.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29 (2012) |
| [17] |
|
| [18] |
|
| [19] |
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) |
| [20] |
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 115(34), 8505–8510 (2018). https://doi.org/10.1073/pnas.1718942115 |
| [21] |
Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constructive Approximation 55, 73–125 (2022) arXiv:1904.00377 |
| [22] |
|
| [23] |
|
| [24] |
Liang, S., Srikant, R.: Why deep neural networks for function approximation?, ICLR (2017) arXiv:1610.04161 [cs] |
| [25] |
|
| [26] |
Li, B., Tang, S., Yu, H.: PowerNet: Efficient representations of polynomials and smooth functions by deep neural networks with rectified power units. arXiv:1909.05136, J. Math. Study 53(2), 159–191 (2020) |
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
Platte, R.B., Trefethen, L.N.: Chebfun: A New Kind of Numerical Computing. In: Fitt, A.D., Norbury, J., Ockendon, H., Wilson, E. (eds.) Progress in Industrial Mathematics at ECMI 2008. Mathematics in Industry, pp. 69–87. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12110-4_5 |
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
Shen, J., Wang, Y., Yu, H.: Efficient spectral-element methods for the electronic Schrödinger equation. In: Garcke, J., Pflüger, D. (eds.) Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering, pp. 265–289. Springer International Publishing, Stuttgart (2016) |
| [45] |
|
| [46] |
Telgarsky, M.: Representation benefits of deep feedforward networks. ArXiv150908101 Cs (2015) |
| [47] |
Telgarsky, M.: Benefits of depth in neural networks. In: JMLR: Workshop and Conference Proceedings, vol. 49, pp. 1–23 (2016) |
| [48] |
Trefethen, L.N.: Spectral Methods in MATLAB. Software, Environments, and Tools. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2000) |
| [49] |
E, W., Wang, Q.: Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61(10), 1733–1740 (2018) |
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
E, W., Yu, B.: The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018). https://doi.org/10.1007/s40304-018-0127-z |
| [54] |
Yu, H., Tian, X., E, W., Li, Q.: OnsagerNet: Learning stable and interpretable dynamics using a generalized Onsager principle. Phys. Rev. Fluids 6(11), 114402 (2021) arxiv:2009.02327. https://doi.org/10.1103/PhysRevFluids.6.114402 |
| [55] |
|
| [56] |
Zhang, L., Han, J., Wang, H., Car, R., E, W.: Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120(14), 143001 (2018) |
/
| 〈 |
|
〉 |