Effects of nonlinearity and inter-feature coupling in machine learning studies of Nb alloys with center-environment features

Yuchao Tang , Bin Xiao , Manabu Ihara , Sergei Manzhos , Yi Liu

Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (3) : 38

PDF
Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (3) :38 DOI: 10.20517/jmi.2025.05
Research Article

Effects of nonlinearity and inter-feature coupling in machine learning studies of Nb alloys with center-environment features

Author information +
History +
PDF

Abstract

Prediction of materials properties from descriptors of chemical composition and structure with machine learning (ML) methods has been emerging as a viable approach to materials design and is a major component of the materials informatics field. However, as both experimental and computed data may be costly, one often has to work with limited data, which increases the risk of overfitting. Combining various datasets to improve sampling on the one hand and designing optimal ML models from small datasets on the other, can be used to address this issue. Center-environment (CE) features were recently introduced and showed promise in predicting formation energies, structural parameters, band gaps, and adsorption properties of various materials. Here, we consider the prediction of formation energies of Nb and Nb-Nb5Si3 eutectic alloys substituted with various alloying elements in the Nb and Nb5Si3 phases using CE features - a typical alloy system where the data can be naturally divided into subsets based on the types of substitutional sites. We explore effects of dataset combination and of the functional form of the dependence of the target property on the features. We show that combining the subsets, despite the increased amount of data, can complicate rather than facilitate ML, as different subsets do not increase the density of sampling but sample different parts of space with different distribution patterns, and also have different optimal hyperparameters. The Gaussian process regression-neural network hybrid ML method was used to separate the effects of nonlinearity and inter-feature coupling and show that while for Nb alloys nonlinearity is unimportant, it is critical to Nb-Nb5Si3 alloys. We find that inter-feature coupling terms are unimportant or non-recoverable, demonstrating the utility of more robust and interpretable additive models.

Keywords

Materials informatics / machine learning / kernel regression / feature engineering / alloys

Cite this article

Download citation ▾
Yuchao Tang, Bin Xiao, Manabu Ihara, Sergei Manzhos, Yi Liu. Effects of nonlinearity and inter-feature coupling in machine learning studies of Nb alloys with center-environment features. Journal of Materials Informatics, 2025, 5(3): 38 DOI:10.20517/jmi.2025.05

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Xie T.Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties.Phys Rev Lett2018;120:145301

[2]

Ward L,Krishna A.Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations.Phys Rev B2017;96:024104

[3]

Wang T,Wei Y.Accurate bandgap predictions of solids assisted by machine learning.Mater Today Commun2021;29:102932

[4]

Alsalman M,Alharbi FH.Bandgap energy prediction of senary zincblende III–V semiconductor compounds using machine learning.Mater Sci Semicond Process2023;161:107461

[5]

Li Y,Han Y.Local environment interaction-based machine learning framework for predicting molecular adsorption energy.J Mater Inf2024;4:4

[6]

Kohn W.Self-consistent equations including exchange and correlation effects.Phys Rev1965;140:A1133-8

[7]

Ming H,Molokeev MS.Machine-learning-driven discovery of Mn4+-doped red-emitting fluorides with short excited-state lifetime and high efficiency for mini light-emitting diode displays.ACS Mater Lett2024;6:1790-800

[8]

Bone JM,Menon A.Hierarchical machine learning for high-fidelity 3D printed biopolymers.ACS Biomater Sci Eng2020;6:7021-31

[9]

Zhu J,Sun G.Accelerating design of glass substrates by machine learning using small-to-medium datasets.Ceram Int2024;50:3018-25

[10]

Shim E,Cernak T.Machine learning strategies for reaction development: toward the low-data limit.J Chem Inf Model2023;63:3659-68 PMCID:PMC11163943

[11]

Im J,Ko T,Hyon Y.Identifying Pb-free perovskites for solar cells by machine learning.npj Comput Mater2019;5:177

[12]

Yang J,Mannodi-Kanakkithodi A.Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm.J Chem Phys2024;160:064114

[13]

Liu C,Katsura Y.Machine learning to predict quasicrystals from chemical compositions.Adv Mater2021;33:2102507

[14]

Isayev O,Toher C,Curtarolo S.Universal fragment descriptors for predicting properties of inorganic crystals.Nat Commun2017;8:15679 PMCID:PMC5465371

[15]

Liu Y,Xiao B.Accelerated development of hard high-entropy alloys with data-driven high-throughput experiments.J Mater Inf2022;2:3

[16]

Christensen AS,Faber FA.FCHL revisited: faster and more accurate quantum machine learning.J Chem Phys2020;152:044107

[17]

Bartók AP,Csányi G.On representing chemical environments.Phys Rev B2013;87:184115

[18]

Rogers D.Extended-connectivity fingerprints.J Chem Inf Model2010;50:742-54

[19]

Donoho DL. High-dimensional data analysis: the curses and blessings of dimensionality. In AMS Conference on Math Challenges of the 21st Century; AMS, 2000. https://dl.icdst.org/pdfs/files/236e636d7629c1a53e6ed4cce1019b6e.pdf. (accessed 16 Jun 2025)

[20]

Allen AEA.Machine learning of material properties: predictive and interpretable multilinear models.Sci Adv2022;8:eabm7185 PMCID:PMC9075804

[21]

Liu DD,Zhu YJ.High-throughput phase field simulation and machine learning for predicting the breakdown performance of all-organic composites.J Phys D Appl Phys2024;57:415502

[22]

Rabitz H.General foundations of high-dimensional model representations.J Math Chem1999;25:197-233

[23]

Rabitz H,Shorter J.Efficient input - output model representations.Comput Phys Commun1999;117:11-20

[24]

Li G,Wang SW,Schoendorf J.Random sampling-high dimensional model representation (RS-HDMR) and orthogonality of its different order component functions.J Phys Chem A2006;110:2474-85

[25]

Manzhos S,Ihara M.Orders of coupling representations as a versatile framework for machine learning from sparse data in high-dimensional spaces.Artif Intell Chem2023;1:100008

[26]

Ren O,Voytsekhovsky D,Manzhos S.Random sampling high dimensional model representation Gaussian process regression (RS-HDMR-GPR) for representing multidimensional functions with machine-learned lower-dimensional terms allowing insight with a general method.Comput Phys Commun2022;271:108220

[27]

Li G,Welsh W.High dimensional model representation constructed by support vector regression. I. Independent variables with known probability distributions.J Math Chem2017;55:278-303

[28]

Chen H,Adams S.SoftBV - a software tool for screening the materials genome of inorganic fast ion conductors.Acta Crystallogr B Struct Sci Cryst Eng Mater2019;75:18-33

[29]

Wong LL,Dai R,Chew WS.Bond valence pathway analyzer - an automatic rapid screening tool for fast ion conductors within softBV.Chem Mater2021;33:625-41

[30]

Li Y,Wang Y,Liu Y.Center-environment deep transfer machine learning across crystal structures: from spinel oxides to perovskite oxides.npj Comput Mater2023;9:1068

[31]

Perepezko JH.Materials science. The hotter the engine, the better.Science2009;326:1068-9

[32]

Bewlay BP,Zhao J,Mendiratta MG.Ultrahigh-temperature Nb-silicide-based composites.MRS Bull2003;28:646-53

[33]

Shu J,Zheng C.High-throughput experiment-assisted study of the alloying effects on oxidation of Nb-based alloys.Corros Sci2022;204:110383

[34]

Shi S,Jia L,Sun Z.Ab-initio study of alloying effects on structure stability and mechanical properties of α-Nb5Si3.Comput Mater Sci2015;108:121-7

[35]

Xu W,Wang C.Temperature-dependent mechanical properties of alpha-/beta-Nb5Si3 phases from first-principles calculations.Intermetallics2014;46:72-9

[36]

Papadimitriou I,Tsakiropoulos P.The impact of Ti and temperature on the stability of Nb5Si3 phases: a first-principles study.Sci Technol Adv Mater2017;18:467-79 PMCID:PMC5508373

[37]

Liu G,Kong B,Zhang H.Artificial neural network application to study quantitative relationship between silicide and fracture toughness of Nb-Si alloys.Mater Design2017;129:210-8

[38]

Hart GLW,Toher C.Machine learning for alloys.Nat Rev Mater2021;6:730-55

[39]

Li Y,Tang Y.Center-environment feature model for machine learning study of spinel oxides based on first-principles computations.J Phys Chem C2020;124:28458-68

[40]

Wang X,Li Y.First-principles based machine learning study of oxygen evolution reactions of perovskite oxides using a surface center-environment feature model.Appl Surf Sci2020;531:147323

[41]

Tang Y,Chen J.Multi-component alloying effects on the stability and mechanical properties of Nb and Nb–Si alloys: a first-principles study.Metall Mater Trans A2023;54:450-72

[42]

Ouyang R,Ahmetcik E,Ghiringhelli LM.SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates.Phys Rev Mater2018;2:083802

[43]

A.A.Baikov Institute of Metallurgy and Materials Science. Database on properties of chemical elements. http://phases.imet-db.ru/elements/mendel.aspx?main=1. (accessed 16 Jun 2025)

[44]

Ducker H,Kaufman L,Vapnik V. Support vector regression machines. In Advances in Neural Information Processing Systems. 1996. https://proceedings.neurips.cc/paper_files/paper/1996/file/d38901788c533e8286cb6400b40b386d-Paper.pdf. (accessed 16 Jun 2025)

[45]

Chang CC.LIBSVM: a library for support vector machines.ACM Trans Intell Syst Technol2011;2:1-27

[46]

Breiman L.Random forests.Mach Learn2001;45:5-32

[47]

Manzhos S.Neural network with optimal neuron activation functions based on additive Gaussian process regression.J Phys Chem A2023;127:7823-35

[48]

Rasmussen CE. Gaussian processes for machine learning. The MIT Press; 2005. https://gaussianprocess.org/gpml/chapters/RW.pdf. (accessed 16 Jun 2025)

[49]

Duvenaud DK,Rasmussen CE. Additive Gaussian processes. In: Advances in Neural Information Processing Systems. 2011. https://proceedings.neurips.cc/paper_files/paper/2011/file/4c5bde74a8f110656874902f07378009-Paper.pdf. (accessed 16 Jun 2025)

[50]

Manzhos S.Orders-of-coupling representation achieved with a single neural network with optimal neuron activation functions and without nonlinear parameter optimization.Artif Intell Chem2023;1:100013

[51]

Sobol' IM.On the distribution of points in a cube and the approximate evaluation of integrals.USSR Comput Math Math Phys1967;7:86-112

[52]

Abarbanel OD.Machine learning to accelerate screening for Marcus reorganization energies.J Chem Phys2021;155:054106

[53]

Saidi WA,Castelli IE.Machine-learning structural and electronic properties of metal halide perovskites using a hierarchical convolutional neural network.npj Comput Mater2020;6:307

AI Summary AI Mindmap
PDF

72

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/