ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximation

Shanshan Tang; Bo Li; Haijun Yu

doi:10.1007/s40304-023-00392-0

Communications in Mathematics and Statistics ›› :1 -27. DOI: 10.1007/s40304-023-00392-0

Article

ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximation

Shanshan Tang ¹
, Bo Li ²
, Haijun Yu ³^,⁴^,^c

Author information +

History +

PDF

Abstract

In a previous study by Li et al. (Commun Comput Phys 27(2):379–411, 2020), it is shown that deep neural networks built with rectified power units (RePU) as activation functions can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction, which we call ChebNet. The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series. On the same time, ChebNets are much more stable. Numerical results show that the constructed ChebNets can be further fine-tuned to obtain much better results than those obtained by tuning deep RePU nets constructed by power series approach. As spectral accuracy is hard to obtain by direct training of deep neural networks, ChebNets provide a practical way to obtain spectral accuracy, it is expected to be useful in real applications that require efficient approximations of smooth functions.

Cite this article

Download citation ▾

Shanshan Tang, Bo Li, Haijun Yu. ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximation. Communications in Mathematics and Statistics 1-27 DOI:10.1007/s40304-023-00392-0

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	AvilaG, CarringtonT. Solving the Schrödinger equation using Smolyak interpolants. J. Chem. Phys., 2013, 139(13)

[2]	Bungartz, H.J.: An adaptive Poisson solver using hierarchical bases and sparse grids. In: Iterative Methods in Linear Algebra, Brussels, Belgium, pp. 293–310 (1992)

[3]	BarthelmannV, NovakE, RitterK. High dimensional polynomial interpolation on sparse grids. Adv. Comput. Math., 2000, 12(4): 273-288

[4]	BoydJP. Chebyshev and Fourier Spectral Methods, 20002New YorkDover Publications, INC.

[5]	BungartzHJ, GriebelM. Sparse grids. Acta Numer., 2004, 13: 1-123

[6]	Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160 (2007)

[7]	CheneyEW. Introduction to Approximation Theory, 19822Rhode IslandAMS Chelsea Publishing

[8]	CybenkoG. Approximation by superpositions of a sigmoidal function. Math. Control Signal Systems, 1989, 2(4): 303-314

[9]	CohenA, DeVoreR. Approximation of high-dimensional parametric PDEs. Acta Numer., 2015, 24: 1-159

[10]	Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems 29, 3844–3852 (2016) arXiv:1606.09375

[11]	EldanR, ShamirO. The power of depth for feedforward neural networks. JMLR Workshop Conf. Proc., 2016, 49: 1-34

[12]	GriebelM, HamaekersJ. Sparse grids for the Schrödinger equation. Math. Model. Numer. Anal., 2007, 41(2): 215-247

[13]	GautschiW. Optimally scaled and optimally conditioned Vandermonde and Vandermonde-like matrices. BIT Numerical Mathematics, 2011, 51(1): 103-125

[14]	GuoW, ChengY. A sparse grid discontinuous galerkin method for high-dimensional transport equations and its application to kinetic simulations. SIAM J. Sci. Comput., 2016, 38(6): 3381-3409

[15]	HintonG, OsinderoS, TehY-W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554

[16]	Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., Sainath, T.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29 (2012)

[17]	HighamNJ. Error analysis of the Björck-Pereyra algorithms for solving Vandermonde systems. Numer. Math., 1987, 50(5): 613-632

[18]	HornikK, StinchcombeM, WhiteH. Multilayer feedforward networks are universal approximators. Neural Networks, 1989, 2(5): 359-366

[19]	He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

[20]	Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 115(34), 8505–8510 (2018). https://doi.org/10.1073/pnas.1718942115

[21]	Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constructive Approximation 55, 73–125 (2022) arXiv:1904.00377

[22]	KrizhevskyA, SutskeverI, HintonGE. ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems, 2012, 141(5): 1097-1105

[23]	LeCunY, BengioY, HintonG. Deep learning. Nature, 2015, 521(7553): 436-444

[24]	Liang, S., Srikant, R.: Why deep neural networks for function approximation?, ICLR (2017) arXiv:1610.04161 [cs]

[25]	LiB, TangS, YuH. Better approximations of high dimensional smooth functions by deep neural networks with rectified power units. Commun. Comput. Phys., 2020, 27(2): 379-411

[26]	Li, B., Tang, S., Yu, H.: PowerNet: Efficient representations of polynomials and smooth functions by deep neural networks with rectified power units. arXiv:1909.05136, J. Math. Study 53(2), 159–191 (2020)

[27]	LyuL, ZhangZ, ChenM, ChenJ. MIM: A deep mixed residual method for solving high-order partial differential equations. J. Comput. Phys., 2022, 452

[28]	LinQ, YanN, ZhouA. A sparse finite element method with high accuracy: Part I. Numer. Math., 2001, 88(4): 731-742

[29]	MhaskarHN. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation, 1996, 8(1): 164-177

[30]	MhaskarHN. Approximation properties of a multilayered feedforward artificial neural network. Adv. Comput. Math., 1993, 1(1): 61-80

[31]	MontanelliH, DuQ. New error bounds for deep ReLU networks using sparse grids. SIAM J. Math. Data Sci., 2019, 1(1): 78-92

[32]	NobileF, TemponeR, WebsterC. A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numer. Anal., 2008, 46(5): 2309-2345

[33]	OpschoorJAA, SchwabC, ZechJ. Exponential ReLU DNN expression of holomorphic maps in high dimension. Constructive Approximation, 2021

[34]	PoggioT, MhaskarH, RosascoL, MirandaB, LiaoQ. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. Int. J. Autom. Comput., 2017, 14(5): 503-519

[35]	PetersenP, VoigtlaenderF. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Networks, 2018, 108: 296-330

[36]

Platte, R.B., Trefethen, L.N.: Chebfun: A New Kind of Numerical Computing. In: Fitt, A.D., Norbury, J., Ockendon, H., Wilson, E. (eds.) Progress in Industrial Mathematics at ECMI 2008. Mathematics in Industry, pp. 69–87. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12110-4_5

[37]	RongZ, ShenJ, YuH. A nodal sparse grid spectral element method for multi-dimensional elliptic partial differential equations. Int. J. Numer. Anal. Model., 2017, 14(4–5): 762-783

[38]	RaissiM, PerdikarisP, KarniadakisGE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 2019, 378: 686-707

[39]	SmolyakSA. Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl Akad Nauk SSSR, 1963, 148(5): 1042-1045

[40]	ShenJ, WangLL. Sparse spectral approximations of high-dimensional problems based on hyperbolic cross. SIAM J Numer Anal, 2010, 48(4): 1087-1109

[41]	ShenJ, WangLL, YuH. Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains. J. Comput. Appl. Math., 2014, 265: 264-275

[42]	ShenJ, YuH. Efficient spectral sparse grid methods and applications to high-dimensional elliptic problems. SIAM J. Sci. Comput., 2010, 32(6): 3228-3250

[43]	ShenJ, YuH. Efficient spectral sparse grid methods and applications to high-dimensional elliptic equations II: Unbounded domains. SIAM J. Sci. Comput., 2012, 34(2): 1141-1164

[44]

Shen, J., Wang, Y., Yu, H.: Efficient spectral-element methods for the electronic Schrödinger equation. In: Garcke, J., Pflüger, D. (eds.) Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering, pp. 265–289. Springer International Publishing, Stuttgart (2016)

[45]	SchwabC, TodorRA. Sparse finite elements for elliptic problems with stochastic loading. Numerische Mathematik, 2003, 95(4): 707-734

[46]	Telgarsky, M.: Representation benefits of deep feedforward networks. ArXiv150908101 Cs (2015)

[47]	Telgarsky, M.: Benefits of depth in neural networks. In: JMLR: Workshop and Conference Proceedings, vol. 49, pp. 1–23 (2016)

[48]	Trefethen, L.N.: Spectral Methods in MATLAB. Software, Environments, and Tools. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2000)

[49]	E, W., Wang, Q.: Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61(10), 1733–1740 (2018)

[50]	WangX, SloanI. Why are high-dimensional finance problems often of low effective dimension?. SIAM J. Sci. Comput., 2005, 27(1): 159-183

[51]	XuJ. Finite neuron method and convergence analysis. Commun. Comput. Phys., 2020, 28(5): 1707-1745

[52]	YarotskyD. Error bounds for approximations with deep ReLU networks. Neural Networks, 2017, 94: 103-114

[53]	E, W., Yu, B.: The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018). https://doi.org/10.1007/s40304-018-0127-z

[54]	Yu, H., Tian, X., E, W., Li, Q.: OnsagerNet: Learning stable and interpretable dynamics using a generalized Onsager principle. Phys. Rev. Fluids 6(11), 114402 (2021) arxiv:2009.02327. https://doi.org/10.1103/PhysRevFluids.6.114402

[55]	YserentantH. On the regularity of the electronic Schrödinger equation in Hilbert spaces of mixed derivatives. Numer. Math., 2004, 98(4): 731-759

[56]	Zhang, L., Han, J., Wang, H., Car, R., E, W.: Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120(14), 143001 (2018)