Effects of nonlinearity and inter-feature coupling in machine learning studies of Nb alloys with center-environment features
Yuchao Tang , Bin Xiao , Manabu Ihara , Sergei Manzhos , Yi Liu
Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (3) : 38
Effects of nonlinearity and inter-feature coupling in machine learning studies of Nb alloys with center-environment features
Prediction of materials properties from descriptors of chemical composition and structure with machine learning (ML) methods has been emerging as a viable approach to materials design and is a major component of the materials informatics field. However, as both experimental and computed data may be costly, one often has to work with limited data, which increases the risk of overfitting. Combining various datasets to improve sampling on the one hand and designing optimal ML models from small datasets on the other, can be used to address this issue. Center-environment (CE) features were recently introduced and showed promise in predicting formation energies, structural parameters, band gaps, and adsorption properties of various materials. Here, we consider the prediction of formation energies of Nb and Nb-Nb5Si3 eutectic alloys substituted with various alloying elements in the Nb and Nb5Si3 phases using CE features - a typical alloy system where the data can be naturally divided into subsets based on the types of substitutional sites. We explore effects of dataset combination and of the functional form of the dependence of the target property on the features. We show that combining the subsets, despite the increased amount of data, can complicate rather than facilitate ML, as different subsets do not increase the density of sampling but sample different parts of space with different distribution patterns, and also have different optimal hyperparameters. The Gaussian process regression-neural network hybrid ML method was used to separate the effects of nonlinearity and inter-feature coupling and show that while for Nb alloys nonlinearity is unimportant, it is critical to Nb-Nb5Si3 alloys. We find that inter-feature coupling terms are unimportant or non-recoverable, demonstrating the utility of more robust and interpretable additive models.
Materials informatics / machine learning / kernel regression / feature engineering / alloys
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
A.A.Baikov Institute of Metallurgy and Materials Science. Database on properties of chemical elements. http://phases.imet-db.ru/elements/mendel.aspx?main=1. (accessed 16 Jun 2025) |
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
/
| 〈 |
|
〉 |