Optimization of Random Feature Method in the High-Precision Regime
Jingrun Chen, Weinan E, Yifei Sun
Optimization of Random Feature Method in the High-Precision Regime
Machine learning has been widely used for solving partial differential equations (PDEs) in recent years, among which the random feature method (RFM) exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency. Potentially, the optimization problem in the RFM is more difficult to solve than those that arise in traditional methods. Unlike the broader machine-learning research, which frequently targets tasks within the low-precision regime, our study focuses on the high-precision regime crucial for solving PDEs. In this work, we study this problem from the following aspects: (i) we analyze the coefficient matrix that arises in the RFM by studying the distribution of singular values; (ii) we investigate whether the continuous training causes the overfitting issue; (iii) we test direct and iterative methods as well as randomized methods for solving the optimization problem. Based on these results, we find that direct methods are superior to other methods if memory is not an issue, while iterative methods typically have low accuracy and can be improved by preconditioning to some extent.
Random feature method (RFM) / Partial differential equation (PDE) / Least-squares problem / Direct method / Iterative method
[1.] |
Alaoui, A., Mahoney, M.W.: Fast randomized kernel ridge regression with statistical guarantees. In: advances in neural information processing systems, vol. 28, pp. 775–783. Curran Associates Inc., New York (2015)
|
[2.] |
|
[3.] |
Anderson, E., Bai, Z., Bischof, C., et al.: LAPACK Users’ Guide. SIAM, Philadelphia (1995)
|
[4.] |
|
[5.] |
Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: Proceedings of the 26th Annual Conference on Learning Theory, PMLR, pp. 185–209 (2013)
|
[6.] |
Balay, S., Gropp, W., McInnes, L.C., Smith, B.F.: PETSc: the portable, extensible toolkit for scientific computation. Argonne National Laboratory, vol. 2, no. 17 (1998)
|
[7.] |
|
[8.] |
|
[9.] |
Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28(2), 135–151 (2002)
|
[10.] |
Bollhöfer, M., Schenk, O., Janalik, R., Hamm, S., Gullapalli, K.: State-of-the-art sparse direct solvers. In: Grama, A., Sameh, A. (eds) Parallel Algorithms in Computational Science and Engineering. Modeling and Simulation in Science, Engineering and Technology. pp. 3–33. Birkhäuser, Cham (2020)
|
[11.] |
|
[12.] |
|
[13.] |
Chen, J., Chi, X., E, W.: Bridging traditional and machine learning-based algorithms for solving PDEs: the random feature method. J. Mach. Learn. 1(3), 268–298 (2022)
|
[14.] |
Chen, Z., Schaeffer, H.: Conditioning of random feature matrices: double descent and generalization error. arXiv:2110.11477 (2021)
|
[15.] |
|
[16.] |
Davis, T.A.: Direct Methods for Sparse Linear Systems. Fundamentals of Algorithms. SIAM, Philadelphia (2006)
|
[17.] |
|
[18.] |
|
[19.] |
|
[20.] |
E, W., Bing, Y.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
|
[21.] |
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
|
[22.] |
E, W., Wang, Q.: Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61(10), 1733–1740 (2018)
|
[23.] |
|
[24.] |
Falgout, R.D., Yang, U.M.: hypre: A library of high performance preconditioners. In: Sloot, P.M.A., Hoekstra, A.G., Tan, C.J.K., Dongarra, J.J. (eds) Computational Science—ICCS 2002. ICCS 2002. Lecture Notes in Computer Science, vol. 2331, pp. 632–641. Springer, Berlin, Heidelberg (2002)
|
[25.] |
|
[26.] |
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins Studies in the Mathematical Sciences, vol. 3. Johns Hopkins University Press, Baltimore (2013)
|
[27.] |
|
[28.] |
Guennebaud, G., et al.: Eigen v3. http://eigen.tuxfamily.org (2010)
|
[29.] |
|
[30.] |
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 115(34), 8505–8510 (2018)
|
[31.] |
|
[32.] |
|
[33.] |
|
[34.] |
IEEE Standard for Floating-Point Arithmetic, IEEE Std 754-2019 (Revision of IEEE 754-2008), 1–84 (2019)
|
[35.] |
|
[36.] |
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR preprint. arXiv:1412.6980 (2014)
|
[37.] |
Lehoucq, R.B., Sorensen, D.C., Yang, C.: RPACK users’ guide—solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods. Software, environments, tools (1998)
|
[38.] |
|
[39.] |
|
[40.] |
Luo, T., Yang, H.: Two-layer neural networks for partial differential equations: optimization and generalization theory. arXiv:2006.15733 (2020)
|
[41.] |
Neal, R.M.: Bayesian learning for neural networks. Ph.D. thesis, University of Toronto, Toronto (1995)
|
[42.] |
|
[43.] |
|
[44.] |
|
[45.] |
|
[46.] |
|
[47.] |
Rahaman, N., Baratin, A., Arpit, D., Dräxler, F., Lin, M., Hamprecht, F.A., Bengio, Y., Courville, A.C.: On the spectral bias of neural networks. In: International Conference on Machine Learning (2018)
|
[48.] |
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems (NIPS), vol. 20. Curran Associates Inc. (2007)
|
[49.] |
|
[50.] |
Ruder, S.: An overview of gradient descent optimization algorithms. CoRR preprint. arXiv:1609.04747 (2016)
|
[51.] |
Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 1, pp. 1657–1665. MIT Press, Cambridge (2015)
|
[52.] |
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM (2003)
|
[53.] |
Shen, J., Tang, T., Wang, L.-L.: Spectral Methods: Algorithms, Analysis and Applications, vol. 41. Springer Science & Business Media, New York (2011)
|
[54.] |
|
[55.] |
Sutherland, D.J., Schneider, J.: On the error of random Fourier features. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (Arlington, Virginia, USA), UAI’15. pp. 862–871. AUAI Press (2015)
|
[56.] |
The Trilinos Project Team. The Trilinos Project Website (2023). https://trilinos.github.io
|
[57.] |
Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ, Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020)
|
[58.] |
Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Advances in Neural Information Processing Systems, vol. 13. MIT Press, Cambridge (2000)
|
[59.] |
|
[60.] |
|
[61.] |
|
[62.] |
Zhang, X., Wang, Q., Zhang, Y.: Model-driven level 3 BLAS performance optimization on Loongson 3a processor. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp. 684–691 (2012)
|
[63.] |
|
[64.] |
|
/
〈 | 〉 |