Overview Frequency Principle/Spectral Bias in Deep Learning
Zhi-Qin John Xu , Yaoyu Zhang , Tao Luo
Communications on Applied Mathematics and Computation ›› 2024, Vol. 7 ›› Issue (3) : 827 -864.
Overview Frequency Principle/Spectral Bias in Deep Learning
Understanding deep learning is increasingly emergent as it penetrates more and more into industry and science. In recent years, a research line from Fourier analysis sheds light on this magical “black box” by showing a Frequency principle (F-Principle or spectral bias) of the training behavior of deep neural networks (DNNs)—DNNs often fit functions from low to high frequencies during the training. The F-Principle is first demonstrated by one-dimensional (1D) synthetic data followed by the verification in high-dimensional real datasets. A series of works subsequently enhance the validity of the F-Principle. This low-frequency implicit bias reveals the strength of neural networks in learning low-frequency functions as well as its deficiency in learning high-frequency functions. Such understanding inspires the design of DNN-based algorithms in practical problems, explains experimental phenomena emerging in various scenarios, and further advances the study of deep learning from the frequency perspective. Although incomplete, we provide an overview of the F-Principle and propose some open problems for future research.
Neural network / Frequency principle (F-Principle) / Deep learning / Generalization / Training / Optimization / Information and Computing Sciences / Artificial Intelligence and Image Processing
| [1] |
Agarwal, R., Frosst, N., Zhang, X., Caruana, R., Hinton, G.E.: Neural additive models: interpretable machine learning with neural nets. arXiv:2004.13912 (2020) |
| [2] |
Arora, S., Du, S., Hu, W., Li, Z., Wang, R.: Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In: International Conference on Machine Learning, pp. 322–332 (2019) |
| [3] |
Arpit, D., Jastrzębski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M.S., Maharaj, T., Fischer, A., Courville, A., Bengio, Y., Lacoste-Julien, S.: A closer look at memorization in deep networks. In: International Conference on Machine Learning, pp. 233–242 (2017) |
| [4] |
|
| [5] |
Baratin, A., George, T., Laurent, C., Hjelm, R.D., Lajoie, G., Vincent, P., Lacoste-Julien, S.: Implicit regularization via neural feature alignment. arXiv:2008.00938 (2020) |
| [6] |
Basri, R., Galun, M., Geifman, A., Jacobs, D., Kasten, Y., Kritchman, S.: Frequency bias in neural networks for input of non-uniform density. In: International Conference on Machine Learning, pp. 685–694 (2020) |
| [7] |
|
| [8] |
Bi, S., Xu, Z., Srinivasan, P., Mildenhall, B., Sunkavalli, K., Hašan, M., Hold-Geoffroy, Y., Kriegman, D., Ramamoorthi, R.: Neural reflectance fields for appearance acquisition. arXiv:2008.03824 (2020) |
| [9] |
Biland, S., Azevedo, V.C., Kim, B., Solenthaler, B.: Frequency-aware reconstruction of fluid simulations with generative networks. arXiv:1912.08776 (2019) |
| [10] |
Bordelon, B., Canatar, A., Pehlevan, C.: Spectrum dependent learning curves in kernel regression and wide neural networks. In: International Conference on Machine Learning, pp. 1024–1034 (2020) |
| [11] |
Breiman, L.: Reflections after refereeing papers for nips. In: The Mathematics of Generalization, pp. 11–15 (1995) |
| [12] |
Brenner, S.C., Scott, L.R.: The Mathematical Theory of Finite Element Methods. Springer, New York (2008) |
| [13] |
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. arXiv:2005.14165 (2020) |
| [14] |
|
| [15] |
Cai, W., Xu, Z.-Q.J.: Multi-scale deep neural networks for solving high dimensional PDEs. arXiv:1910.11710 (2019) |
| [16] |
Campo, M., Chen, Z., Kung, L., Virochsiri, K., Wang, J.: Band-limited soft actor critic model. arXiv:2006.11431 (2020) |
| [17] |
Camuto, A., Willetts, M., Şimşekli, U., Roberts, S., Holmes, C.: Explicit regularisation in Gaussian noise injections. arXiv:2007.07368 (2020) |
| [18] |
Cao, Y., Fang, Z., Wu, Y., Zhou, D.-X., Gu, Q.: Towards understanding the spectral bias of deep learning. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 2205–2211 (2021) |
| [19] |
Chakrabarty, P.: The spectral bias of the deep image prior. In: Bayesian Deep Learning Workshop and Advances in Neural Information Processing Systems (NeurIPS) (2019) |
| [20] |
|
| [21] |
Chen, H., Lin, M., Sun, X., Qi, Q., Li, H., Jin, R.: MuffNet: multi-layer feature federation for mobile deep learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019) |
| [22] |
|
| [23] |
|
| [24] |
Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: Artificial Intelligence and Statistics, pp. 192–204 (2015) |
| [25] |
Deng, X., Zhang, Z.M.: Is the meta-learning idea able to improve the generalization of deep neural networks on the standard supervised learning? In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 150–157 (2021) |
| [26] |
|
| [27] |
Dong, B., Hou, J., Lu, Y., Zhang, Z.: Distillation ≈\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\approx $$\end{document} early stopping? Harvesting dark knowledge utilizing anisotropic information retrieval for overparameterized neural network. arXiv:1910.01255 (2019) |
| [28] |
|
| [29] |
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017) |
| [30] |
E, W., Han, J., Jentzen, A.: Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning. Nonlinearity 35(1), 278 (2021) |
| [31] |
E, W., Ma, C., Wang, J.: Model reduction with memory and the machine learning of dynamical systems. Commun. Comput. Phys. 25(4), 947–962 (2018) |
| [32] |
E, W., Ma, C., Wu, L.: A priori estimates of the population risk for two-layer neural networks. Commun. Math. Sci. 17(5), 1407–1425 (2019) |
| [33] |
E, W., Ma, C., Wu, L.: Machine learning from a continuous viewpoint, I. Sci. China Math. 63, 2233–2266 (2020) |
| [34] |
E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018) |
| [35] |
Engel, A., Broeck, C.V.d.: Statistical Mechanics of Learning. Cambridge University Press, Cambridge (2001) |
| [36] |
Evans, L.C.: Partial Differential Equations. American Mathematical Society, Providence, Rhode Island (2010) |
| [37] |
|
| [38] |
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. arXiv:1803.03635 (2018) |
| [39] |
Fu, Y., Guo, H., Li, M., Yang, X., Ding, Y., Chandra, V., Lin, Y.: CPT: efficient deep neural network training via cyclic precision. arXiv:2101.09868 (2021) |
| [40] |
Fu, Y., You, H., Zhao, Y., Wang, Y., Li, C., Gopalakrishnan, K., Wang, Z., Lin, Y.: Fractrain: fractionally squeezing bit savings both temporally and spatially for efficient DNN training. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, Virtual (2020) |
| [41] |
Giryes, R., Bruna, J.: How can we use tools from signal processing to understand better neural networks? Inside Signal Processing Newsletter (2020) |
| [42] |
|
| [43] |
Guo, M., Fathi, A., Wu, J., Funkhouser, T.: Object-centric neural scene rendering. arXiv:2012.08503 (2020) |
| [44] |
|
| [45] |
Han, J., Zhang, L., Car, R., E, W.: Deep potential: a general representation of a many-body potential energy surface. Commun. Comput. Phys. 23, 3 (2018) |
| [46] |
Häni, N., Engin, S., Chao, J.-J., Isler, V.: Continuous object representation networks: novel view synthesis without target view supervision. arXiv:2007.15627 (2020) |
| [47] |
He, J., Li, L., Xu, J., Zheng, C.: ReLU deep neural networks and linear finite elements. arXiv:1807.03973 (2018) |
| [48] |
|
| [49] |
He, S., Wang, X., Shi, S., Lyu, M.R., Tu, Z.: Assessing the bilingual knowledge learned by neural machine translation models. arXiv:2004.13270 (2020) |
| [50] |
Hennigh, O., Narasimhan, S., Nabian, M.A., Subramaniam, A., Tangsali, K., Fang, Z., Rietmann, M., Byeon, W., Choudhry, S.: NVIDIA SimNet: an AI-accelerated multi-physics simulation framework. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science-ICCS 2021. Lecture Notes in Computer Science, vol. 12746, pp. 447–461. Springer, Cham (2021) |
| [51] |
Hu, W., Xiao, L., Adlam, B., Pennington, J.: The surprising simplicity of the early-time learning dynamics of neural networks. arXiv:2006.14599 (2020) |
| [52] |
|
| [53] |
Huang, X., Liu, H., Shi, B., Wang, Z., Yang, K., Li, Y., Weng, B., Wang, M., Chu, H., Zhou, J., Yu, F., Hua, B., Chen, L., Dong, B.: Solving partial differential equations with point source based on physics-informed neural networks. arXiv:2111.01394 (2021) |
| [54] |
Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: convergence and generalization in neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8580–8589 (2018) |
| [55] |
|
| [56] |
Jiang, L., Dai, B., Wu, W., Loy, C.C.: Focal frequency loss for generative models. arXiv:2012.12821 (2020) |
| [57] |
Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., Bengio, S.: Fantastic generalization measures and where to find them. In: International Conference on Learning Representations (2019) |
| [58] |
Jin, H., Montúfar, G.: Implicit bias of gradient descent for mean squared error regression with wide neural networks. arXiv:2006.07356 (2020) |
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014) |
| [63] |
Kopitkov, D., Indelman, V.: Neural spectrum alignment: empirical study. In: International Conference on Artificial Neural Networks, pp. 168–179. Springer (2020) |
| [64] |
Lampinen, A.K., Ganguli, S.: An analytic theory of generalization dynamics and transfer learning in deep linear networks. In: The International Conference on Learning Representations (2019) |
| [65] |
|
| [66] |
Li, M., Soltanolkotabi, M., Oymak, S.: Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 4313–4324 (2020) |
| [67] |
Li, X.-A., Xu, Z.-Q.J., Zhang, L.: A multi-scale DNN algorithm for nonlinear elliptic equations with multiple scales. Commun. Comput. Phys. 28(5), 1886–1906 (2020). https://doi.org/10.4208/cicp.OA-2020-0187 |
| [68] |
Li, X.-A., Xu, Z.-Q.J., Zhang, L.: Subspace decomposition based DNN algorithm for elliptic-type multi-scale PDEs. J. Comput. Phys. 488, 112242 (2023). https://doi.org/10.2139/ssrn.4020731 |
| [69] |
Li, Y., Peng, W., Tang, K., Fang, M.: Spatio-frequency decoupled weak-supervision for face reconstruction. Comput. Intell. Neurosci. 2022, 1–12 (2022) |
| [70] |
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A.: Fourier neural operator for parametric partial differential equations. arXiv:2010.08895 (2020) |
| [71] |
Liang, S., Lyu, L., Wang, C., Yang, H.: Reproducing activation function for deep learning. arXiv:2101.04844 (2021) |
| [72] |
Lin, J., Camoriano, R., Rosasco, L.: Generalization properties and implicit regularization for multiple passes SGM. In: International Conference on Machine Learning, pp. 2340–2348 (2016) |
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
Luo, T., Ma, Z., Wang, Z., Xu, Z.-Q.J., Zhang, Y.: Fourier-domain variational formulation and its well-posedness for supervised learning. arXiv:2012.03238 (2020) |
| [77] |
Luo, T., Ma, Z., Xu, Z.-Q.J., Zhang, Y.: On the exact computation of linear frequency principle dynamics and its generalization. arXiv:2010.08153 (2020) |
| [78] |
|
| [79] |
|
| [80] |
Ma, C., Wu, L., E., W.: The slow deterioration of the generalization error of the random feature model. In: Mathematical and Scientific Machine Learning, pp. 373–389 (2020) |
| [81] |
Ma, Y., Xu, Z.-Q.J., Zhang, J.: Frequency principle in deep learning beyond gradient-descent-based training. arXiv:2101.00747 (2021) |
| [82] |
|
| [83] |
Michoski, C., Milosavljevic, M., Oliver, T., Hatch, D.: Solving irregular and data-enriched differential equations using deep neural networks. arXiv:1905.04351 (2019) |
| [84] |
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: European Conference on Computer Vision, pp. 405–421. Springer (2020) |
| [85] |
|
| [86] |
Mingard, C., Skalse, J., Valle-Pérez, G., Martínez-Rubio, D., Mikulik, V., Louis, A.A.: Neural networks are a priori biased towards boolean functions with low entropy. arXiv:1909.11522 (2019) |
| [87] |
Nye, M., Saxe, A.: Are efficient deep representations learnable? arXiv:1807.06399 (2018) |
| [88] |
Peng, S., Zhang, Y., Xu, Y., Wang, Q., Shuai, Q., Bao, H., Zhou, X.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9054–9063 (2021) |
| [89] |
Peng, W., Zhou, W., Zhang, J., Yao, W.: Accelerating physics-informed neural network training with prior dictionaries. arXiv:2004.08151 (2020) |
| [90] |
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10318–10327 (2021) |
| [91] |
Rabinowitz, N.C.: Meta-learners’ learning dynamics are unlike learners’. arXiv:1905.01320 (2019) |
| [92] |
Rahaman, N., Arpit, D., Baratin, A., Draxler, F., Lin, M., Hamprecht, F.A., Bengio, Y., Courville, A.: On the spectral bias of deep neural networks. In: International Conference on Machine Learning (2019) |
| [93] |
|
| [94] |
Rotskoff, G.M., Vanden-Eijnden, E.: Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 7146–7155 (2018) |
| [95] |
|
| [96] |
Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In: The International Conference on Learning Representations (2014) |
| [97] |
|
| [98] |
Shalev-Shwartz, S., Shamir, O., Shammah, S.: Failures of gradient-based deep learning. In: International Conference on Machine Learning, pp. 3067–3075 (2017) |
| [99] |
Sharma, R., Ross, A.: D-NetPAD: an explainable and interpretable iris presentation attack detector. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–10 (2020) |
| [100] |
Shen, J., Tang, T., Wang, L.L.: Spectral Methods: Algorithm, Analysis and Applications. Springer, Berlin (2011) |
| [101] |
Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information. arXiv:1703.00810 (2017) |
| [102] |
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015) |
| [103] |
|
| [104] |
|
| [105] |
Tancik, M., Mildenhall, B., Wang, T., Schmidt, D., Srinivasan, P.P., Barron, J.T., Ng, R.: Learned initializations for optimizing coordinate-based neural representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2846–2855 (2021) |
| [106] |
|
| [107] |
|
| [108] |
Wang, J., Xu, Z.-Q.J., Zhang, J., Zhang, Y.: Implicit bias with Ritz-Galerkin method in understanding deep learning for solving PDEs. arXiv:2002.07989 (2020) |
| [109] |
|
| [110] |
Xi, Y., Jia, W., Zheng, J., Fan, X., Xie, Y., Ren, J., He, X.: DRL-GAN: dual-stream representation learning GAN for low-resolution image classification in UAV applications. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 14, 1705–1716 (2020) |
| [111] |
|
| [112] |
Xu, R., Wang, X., Chen, K., Zhou, B., Loy, C.C.: Positional encoding as spatial inductive bias in GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13569–13578 (2021) |
| [113] |
Xu, Z.-Q.J.: Understanding training and generalization in deep learning by Fourier analysis. arXiv:1808.04295 (2018) |
| [114] |
|
| [115] |
Xu, Z.-Q.J., Zhang, Y., Xiao, Y.: Training behavior of deep neural network in frequency domain. In: International Conference on Neural Information Processing, pp. 264–274. Springer (2019) |
| [116] |
|
| [117] |
Yang, G., Salman, H.: A fine-grained spectral perspective on neural networks. arXiv:1907.10599 (2019) |
| [118] |
Yang, M., Wang, Z., Chi, Z., Zhang, Y.: FreGAN: exploiting frequency components for training GANs under limited data. arXiv:2210.05461 (2022) |
| [119] |
You, H., Li, C., Xu, P., Fu, Y., Wang, Y., Chen, X., Lin, Y., Wang, Z., Baraniuk, R.G.: Drawing early-bird tickets: towards more efficient training of deep networks. In: International Conference on Learning Representations (2020) |
| [120] |
|
| [121] |
|
| [122] |
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: 5th International Conference on Learning Representations (2017) |
| [123] |
Zhang, L., Luo, T., Zhang, Y., Xu, Z.-Q.J., Ma, Z.: MOD-NET: a machine learning approach via model-operator-data network for solving PDEs. arXiv:2107.03673 (2021) |
| [124] |
Zhang, Y., Li, Y., Zhang, Z., Luo, T., Xu, Z.-Q.J.: Embedding principle: a hierarchical structure of loss landscape of deep neural networks. arXiv:2111.15527 (2021) |
| [125] |
|
| [126] |
Zhang, Y., Xu, Z.-Q.J., Luo, T., Ma, Z.: Explicitizing an implicit bias of the frequency principle in two-layer neural networks. arXiv:1905.10264 (2019) |
| [127] |
Zhang, Y., Xu, Z.-Q.J., Luo, T., Ma, Z.: A type of generalization error induced by initialization in deep neural networks. In: Mathematical and Scientific Machine Learning, pp. 144–164 (2020) |
| [128] |
Zhang, Y., Zhang, Z., Luo, T., Xu, Z.-Q.J.: Embedding principle of loss landscape of deep neural networks. NeurIPS (2021) |
| [129] |
|
| [130] |
Zhu, H., Qiao, Y., Xu, G., Deng, L., Yu, Y.-F.: DSPNet: a lightweight dilated convolution neural networks for spectral deconvolution with self-paced learning. IEEE Trans. Ind. Inf. 16, 7392 (2019) |
Shanghai University
/
| 〈 |
|
〉 |