1 Introduction
Superconducting materials have demonstrated substantial potential in a wide range of applications, including electricity transmission, railway transportation, strong magnetic fields for nuclear fusion and medical imaging, quantum computing, etc. The critical temperatures (
Tc) of superconductors, below which zero resistance and perfect diamagnetism emerge, are the most crucial factors influencing their practicality. Therefore, precise prediction of
Tc is essential for advancing the utilization of superconductors. The well-known Bardeen−Cooper−Schrieffer (BCS) theory [
1] has effectively explained many properties of superconductors, especially conventional or low-temperature ones. However, its ability to precisely predict
values and identify new superconductors is still limited [
2-
10].
In recent years, the neural network artificial intelligence (AI) model has been applied to study the
Tc of superconductors. In view of previous reports, the descriptors used in AI models can be classified into three groups: chemical composition of materials [
11-
14], structures of materials [
10,
15-
18], and fundamental properties of elements composing the materials [
19-
22]. Among these, the descriptors based on chemical formulas are constrained by the absence of material property parameters; the corresponding models have limited predictive power. For the descriptors based on structural information, the corresponding models cannot predict
Tc for the materials with unknown structures. In contrast, the descriptors based on elemental properties can be easily obtained and play the role of universal descriptors for a wide range of applications in all inorganic materials. We, therefore, chose to use the descriptors based on elemental properties.
Nevertheless, the neural network model using 145 elemental descriptors proposed by Ward
et al. [
23] could not accurately predict
Tc due to the mismatch between small datasets and high-dimensional descriptors [
19]. In this work, we introduced a novel hierarchical neural network (HNN) algorithm to address this issue with a
Tc database from SuperCon [
24] after a careful data-cleaning process. In addition, we used a new set of 909 descriptors with a significant addition to the 145 descriptors introduced by Ward
et al. [
23]. The resulting model prediction exhibits a significant improvement in
R2 values and agrees well with experimental findings. This new AI methodology could have a profound impact on the advancement of superconducting materials, highlighting great benefits stemming from the collaboration between artificial intelligence and materials science.
2 Workflow of AI
In general, the AI workflow involves the following steps: data collection and cleaning, descriptor (feature) construction, descriptor selection, and model construction with a specific algorithm, in this case, a hierarchical neural network framework.
2.1 Data collection and cleaning
To ensure the reliability of machine learning (ML), it is crucial to have a robust and extensive dataset. In this study, we used the data from the SuperCon database, which includes information, such as Tc’s, from published journals about experimental studies on known superconducting materials. Notably, as the largest database of its kind, the SuperCon database comprises over 14 794 records of conventional superconductors. However, these data show several defects that hinder their direct applicability in AI:
1) Absence of Tc. In some cases, the composition information of materials exists in the database, but the corresponding Tc information is missing, leading to incomplete datasets. As a result, we cleaned out 3052 data points of this type.
2) Ambiguous composition. For the materials such as , the ratio of oxygen atoms, denoted as “z”, is difficult to determine, making it indiscernible by computers. As a result, we cleaned 24 data points of this type.
3) Data redundancy. This issue manifests in two distinct forms. Firstly, the identical composition has different Tc values. For instance, exhibits Tc of both 8.1 K and 11 K. Secondly, the proportions of elements in the material are adjusted in multiples, or the order of elements is modified. For example, and represent the same material, albeit with different chemical notations. The first case introduces excessive noise that may compromise model accuracy. The second case impedes the accurate evaluation of the model because the identical material might simultaneously reside within both the training and testing datasets, leading to significant overestimation of the model. As a result, we cleaned out 6505 data points of this type.
Following these steps to clean out 9581 data points, we obtained 5213 entries of reliable data for conventional superconductors. We also considered incorporating high-temperature superconductors into our dataset. However, as emphasized above, the SuperCon database is merely a basic compilation, where substantial inappropriate data must be removed even for conventional superconductors. The situation is more complex for high-temperature superconductors.
In cuprate superconductors, the superconducting Tc varies with carrier density introduced by doping. Typically, the Tc versus doping phase diagram displays a dome-shaped characteristic, indicating a specific doping range where superconductivity appears, as shown in Fig.1(a). However, oxygen contents are notoriously difficult to measure accurately, i.e., the doping level due to the change in oxygen content cannot be determined unless an independent measurement of the doping level is performed. This raises concerns about the accuracy of SuperCon data on cuprate superconductors. As shown in Fig.1(a) and (b), the system data show dome-shaped characteristics, while this is not the case in the system because oxygen contents were not determined accurately. There are many other examples of such inaccuracies, where data appear chaotic and inconsistent with expected trends. This indicates that the data of high-temperature superconductors have not been rigorously cleaned and contain many unreliable entries. It is essential to survey reliable data from the literature in the future before applying ML to high-temperature superconductors. In addition, there is ongoing debate about whether conventional superconductors and high-temperature superconductors share the same underlying mechanisms. We, therefore, decided not to include high-temperature superconductors in this study.
2.2 Descriptor construction
Previous studies have elected specific descriptors designed only for certain material properties [
25]. However, in the works to predict the formation energy and
of superconductors [
19,
22,
26], Ward
et al. [
23] developed 145 universal descriptors using 22 elemental properties. Here, we followed Ward
et al. [
23] and used a set of universal element descriptors that cover a broad range of properties for inorganic materials. This strategy is expected to enhance the future development of comprehensive AI models that integrate large sets of specialized models in materials science. In this study, we extracted crucial thermal, physical, and crystallographic data from databases and literature [
27-
29], bringing the total element properties from 22 to 53, as listed in Tab.1.
Furthermore, we followed Ward
et al. [
23] and further developed the descriptors construction rules shown in Tab.2. We added calculations for configurational entropy, valence electron occupation state, and ionicity. We also added the weights for the lowest, maximum, and range values in the rules. Furthermore, we incorporated the Absolute Percentage (AP), which reflects variations in atomic size, and the electronegativity, as it significantly affects the material structures and properties. Finally, based on these properties and rules, we constructed a comprehensive set of descriptors, bringing the total number of universal descriptors from 145 to 909.
2.3 Descriptor selection
Due to limited data in materials science, training a single neural network with all 909 descriptors is impractical. For instance, an Artificial Neural Network (ANN) with two hidden layers and 909 input descriptors would require over one billion hyper-parameters. In principle, one would need a comparable number of data to determine these hyper-parameters through AI training, presenting a formidable challenge in addressing materials science issues. In previous studies, researchers have believed that too many descriptors are harmful to the ML model [
25]. To tackle this, a Genetic Algorithm (GA) developed by Holland [
30] has been utilized to streamline the dimensionality of the descriptor (feature extraction) in ANN-based ML processes [
31,
32].
In Fig.2, the
x-axis represents the descriptor dimension
d, and the
y-axis represents the testing
R2 score. When
d = 49, the accuracy of our Convolutional Neural Network [
33] (CNN) model reached a turning point, and the testing
R2 score was saturated at 0.92. Besides, as shown in Tab.3, the five main descriptors, MD1 to MD5, occurred 1055, 960, 944, 914, and 902 times, respectively, while other descriptors occurred fewer than 500 times, indicating the primary importance of the main descriptors.
2.4 HNN construction
Based on extensive testing, we found that multiple combinations of descriptors with a reduced dimension could yield high-performing CNN models. This suggests that each set of descriptors carries information related to the superconducting critical temperature. We, therefore, concluded that the turning point effect shown in Fig.2 comes from the limited capability of a given data number to determine the hyper-parameters with a given neural network structure, not because some descriptors are harmful in AI learning, as stated in previous reports [
20].
Thus, we realized that it is necessary to design an ensemble statistical algorithm to absorb all knowledge from the entire set of 909 descriptors. Traditional ensemble algorithms include bagging, boosting, and stacking. Bagging and boosting algorithms simply perform a geometric averaging of the output of each model. We have previously used stacking algorithms to statistically average the same input of descriptors for different ML algorithms [
22]. However, given the variable performances across ensembles with different descriptor sets, simple geometric averaging will not effectively capture all valuable insights.
To address this challenge, we develop an ensemble algorithm named the Hierarchical Neural Network (HNN), which statistically integrates large sets of individual ensembles (sub-models) with different weighting factors by training the hyperparameters of a layered neural network. As shown in Fig.3, the small dotted boxes represent the ANN/CNN sub-models, which predict the output value of Tc. The sub-models with the same horizontal position belong to the same ensemble layer. Each sub-model is represented as , where b and c represent the layer number and the number of sub-models in the same ensemble layer, respectively. The inputs to the HNN’s sub-models have the reduced dimension (49) for the given data set and the basic neural network structure.
Importantly, multiple sets of 49 descriptors are used as the input of the HNN’s first layer. Among these descriptors, some provide common essential information about the superconducting Tc, such as the 5 main descriptors mentioned in Tab.3. The remaining 44 descriptors are randomly selected with a combinatorial algorithm from the other 904 descriptors. The diversity of descriptors leads to a variety of sub-models, enhancing the generalization ability of the final HNN model. The input of the next layer is the output of the previous layer, which means each subset has a trainable weight. The HNN model integrates sub-models with various combinations of reduced descriptors as inputs, thereby reducing the possibility of overfitting.
3 Results and discussion
To validate the effectiveness of the descriptors and the HNN algorithm proposed in this paper, we used the cleaned dataset consisting of 5213 data points to train different AI models. The input included 145 descriptors proposed by Ward
et al. [
23], 909 descriptors introduced in this study, and one set of 49 descriptors (main descriptors included) selected from these 909 descriptors through GA screening (Fig.2). The AI techniques included ANN, CNN, XGBoost ANN/CNN, and HNN-integrated ANN/CNN. The XGBoost is widely recognized as a powerful and popular ensemble learning-based algorithm [
34]. It leverages the power of multiple regression tree models to enhance predictive performance, with the final output being a simple additive combination of the individual tree predictions. In total, we constructed 13 models, named 49-ANN, 909-HNN(CNN), etc. The
R2 values of these models on the testing dataset are presented in Fig.4(c). In order of their
R2 from low to high, the models are 49-ANN, 145-ANN, 49-XGBoost, 145-CNN, 145-XGBoost, 145-HNN(ANN), 145-HNN(CNN), 49-CNN, 909-ANN, 909-XGBoost, 909-HNN(ANN), 909-CNN, and 909-HNN(CNN). To show the overfitting effect, Fig.4(c) also includes the difference between the
R2 values of the training dataset and the testing dataset for these models, defined as Δ
R2.
There are five significant findings from Fig.4. Firstly, as shown in Fig.4(a), the ANN model with 145 descriptors exhibits large errors in the training set when Tc is greater than 20 K, indicating underfitting and suggesting that these 145 descriptors are incomplete. In contrast, the model with HNN and 909 descriptors shows negligible errors, as illustrated in Fig.4(b). Secondly, compared to individual models, the HNN model significantly enhances the model’s predictive power, increasing the testing R2 scores from 0.830 to 0.956. Thirdly, the performance of the CNN model surpasses that of the fully connected ANN. Fourthly, the performance of the HNN model improves with the increase of individual ensembles in the input layer and subsequent integration layers. This effect saturates after the third layer, as shown in Fig. S2. Fifthly, when the descriptors were reduced from 909 to 49, although the ΔR2 of the individual ANN/CNN/XGBoost model decreased, the testing R2 also decreased. This indicates that the dimension reduction alleviated the overfitting effect but possibly missed some Tc-related information. Compared to these models, the HNN model not only achieved higher testing R2 but also reduced overfitting (lower ΔR2).
To further illustrate the ability of the 909-HNN(CNN) model, we decided to test it with high-entropy alloys (HEAs) because they share several common features with the data of conventional superconductors from SuperCon. Firstly, HEAs are metals, similar to conventional superconductors in electrical conductivity. Generally, a good candidate for a superconductor has to be a conductor first. However, conventional superconductors in SuperCon nearly cover all well-known conducting materials. To our knowledge, possible new conductors primarily come from two categories: amorphous alloys and HEAs. Given the long-range disordered atomic structure of amorphous alloys with possible different superconducting mechanisms, we chose the high-entropy alloys as our new testing database outside of SuperCon database. Secondly, the
Tc’s of HEAs are typically less than 10 K [
35], which is much lower than the so-called McMillan limit of ~ 40 K. This implies that the superconducting pairing mechanism in HEAs aligns with that in conventional superconductors. Last but not least, the elements constituting HEAs are all included in the materials from SuperCon (see Fig. S3). Moreover, the cleaned SuperCon database contains 216 materials with element numbers of more than 4, which are similar to HEAs in terms of element number (see Table S2). This suggests that our model should have the ability to predict the
Tc of multi-element materials in HEAs.
After extensive research, we identified 45 HEAs with experimentally measured
Tc from Ref. [
35]. Consequently, the 13 models constructed before are utilized to predict the
Tc of these 45 materials. The results are shown in Fig.5(b). The
R2 values for 49-ANN, 145-ANN, 49-XGBoost, 145-CNN, 145-XGBoost, 145-HNN(ANN), 145-HNN(CNN), 49-CNN, 909-ANN, 909-XGBoost, 909-HNN(ANN), 909-CNN, and 909-HNN(CNN) are 0.72, 0.162, 0.67, 0.502, 0.512, 0.563, 0.694, 0.764, 0.53, 0.585, 0.81, 0.651, and 0.92, respectively. These results are consistent with our analysis based on the testing dataset [Fig.4(c)]. Notably, although these HEAs were not included in our superconductor database, the testing
R2 of the 909-HNN(CNN) model decreased from 0.956 to 0.92, as shown in Fig.5(a), indicating that overfitting is negligible. In contrast, other models exhibited more significant decreases in
R2. For 145-ANN,
R2 dropped from 0.891 to 0.162, pointing to very severe overfitting.
In addition, Fig.5(c) shows the distribution of Tc in conventional superconductors we collected, categorized by the number of elements, along with a distribution chart. It was found that 215 materials contain 5 elements, and one material contains 6 elements, which are listed in Table S2. Those with the highest Tc values typically have an element number between 2 and 4. Due to the lack of multi-element (more than 4) data in the database, the model’s predictions for higher-order systems may be less accurate compared to lower-order systems. Among the 45 HEAs, 23 are quinary systems containing 5 elements and 22 are senary systems containing 6 elements, shown in Table S1. For these high-order systems, our 909-HNN(CNN) model exhibits mean absolute percent errors of 5.3% and 6.1%, respectively, with an overall mean absolute percentage error of 5.7%.
These results clearly demonstrate the robustness of our AI model, suggesting that the underlying correlation of descriptors to the key physical property could be effectively captured through two steps. Firstly, a large number of individual ensembles (more than 103) are trained in parallel, with each using a different combination of input descriptors but the same dataset. Secondly, the aggregated knowledge from these diverse ensembles is integrated and further refined by the HNN algorithm. This approach leverages both the overlapping and unique information in different ensembles, resulting in a superior integrated outcome.
4 Conclusion
In this study, we developed a new ML framework to predict the Tc of metallic inorganic materials. First, we generated a high-quality dataset screened from the SuperCon database, which eliminated the source of dramatic overestimation in the past due to data redundancy. We then created a Hierarchical Neural Network statistical ensemble model to overcome the contradiction between the large number of descriptors and the small number of datasets always faced by previous neural network AI approaches to materials science. With this hurdle removed, we dramatically expanded the number of elemental property-based descriptors from 145 to 909. We found that the Hierarchical Neural Network integrated individual Convolutional Neural Networks with a reduced dimension of 49 descriptors chosen from 909 gave rise to a high testing R2 score of 95.6%. The new model has been put to a test to predict Tc of from a high-entropy alloys database. Among them, 45 alloys’ Tc, which were not included in the SuperCon database, have been found in the extensive search effort to recent literature. We found the predicted Tc’s have a mean absolute percent error of less than 6% compared with experimental values. This success reveals promising directions for advancing materials science and accelerating the development of new materials tailored for various applications.