2025-06-18 2025, Volume 5 Issue 3

  • Select all
  • Research Article
    Wentao Li, Zemeng Wang, Min Zhao, Jiangfeng Pei, Yiwen Hu, Rui Yang, Xiaonan Wang

    Polymer materials, especially rubber, play an indispensable role in modern life and manufacturing. However, their aging and deterioration pose serious challenges to their stability and service life. Unexpected aging can lead to the deterioration of the physical and chemical properties of materials, thereby triggering a series of safety hazards and environmental pollution issues. Exploring the correspondence between the microscopic characteristics and macroscopic properties of materials during the aging process helps researchers deeply understand and control the aging process of materials. Symbolic regression (SR) algorithm, as a machine learning method with strong interpretability, plays an important role in exploring the quantitative relationship of data in scientific fields. This method has a strong potential for discovering the intrinsic quantitative relationships within the experimental data of material aging. In this study, we propose a comprehensive evaluation framework for SR, aiming to identify SR algorithms that are truly suitable for aging experimental data. Furthermore, by integrating characterization data of aging experiments, we conduct further validation and knowledge discovery with the selected method. The results obtained from our experimental data demonstrate a strong consistency with those of the proposed evaluation framework. Notably, this research methodology exhibits extensibility and can serve as a guiding light for the discovery of knowledge and the elucidation of mechanisms within other realms of polymer materials and diverse material systems.

  • Research Article
    Jiansen Wen, Shuwen Yang, Linqin Jiang, Yudong Shi, Zhihan Huang, Ping Li, Hao Xiong, Ze Yu, Xushan Zhao, Bo Xu, Bo Wu, Baisheng Sa, Yu Qiu

    As the most representative and widely utilized hole transport material (HTM), spiro-OMeTAD encounters challenges including limited hole mobility, high production costs, and demanding synthesis conditions. These issues have a notable impact on the overall performance of perovskite solar cells (PSCs) based on spiro-OMeTAD and hinder its large-scale commercial application. Consequently, there exists a strong demand for high-throughput computational design of novel small-molecule HTMs (SM-HTMs) that are cost-effective, easy to synthesize, and offer excellent performance. In this study, a systematic and iterative design and development process for SM-HTMs is proposed, aiming to accelerate the discovery and application of high-performance SM-HTMs. A custom-developed molecular splicing algorithm (MSA) generated a sample space of 200,000 intermediate molecules, culminating in the creation of a comprehensive database of over 7,000 potential SM-HTM candidates. In total, six promising HTM candidates were identified through MSA, density functional theory calculations and high-throughput screening. Furthermore, three machine learning algorithms, namely random forest, gradient boosting decision tree, and extreme gradient boosting (XGBoost), were employed to construct predictive models for key material properties, including hole reorganization energy, solvation free energy, maximum absorption wavelength, and hydrophobicity. Among these, the XGBoost-based model demonstrated the best overall performance. The MSA methodology combining comprehensive SM-HTM database and performance prediction models, as introduced in this study, offers a powerful and universal toolkit for the design and optimization of next-generation SM-HTMs, thereby paving the way for future advancements of PSCs.

  • Research Article
    Yongxiang Li, Xiao Liu, Xiangdong Wang, Wei Xie, Di Qiu, Jiong Yang

    Ablative materials, a special type of thermal protection material, are widely used in extremely high-temperature environments such as hypersonic vehicles and re-entry capsules. They effectively mitigate heat conduction to the interior through ablation at the material surface. Based on traditional physical models and machine learning techniques, we systematically investigated the mapping relationship between multiple material parameters and thermal responses within the carbonized layer and pyrolysis layer of ablative materials. By employing high-throughput physical modeling and the sure independence screening and sparsity operator (SISSO) method for feature selection, we first revealed that the thermal responses of different layers are dominated by distinct material properties (e.g., density, thermal conductivity, heat capacity, etc.). The explicit relationships between the functioning material parameters and the features of the thermal response curves associated with single-/double-layer structures are well established. After the key parameter screening based on SISSO, we further developed a deep neural network surrogate model, capable of accurately predicting the entire thermal response process within the carbonized and pyrolysis layers.

  • Research Article
    Junpeng Song, Ye Shan, Zan Zhang, Shenglong Wang, Haiwei Zhang, Suleman Muhammad, Haiyou Huang, Yongsheng Li

    Creep strain characterizes the degree of creep damage and the creep life of superalloys. The creep process is accompanied by element redistribution and microstructure evolution; understanding the multi-characteristic relationships of creep morphology and strain/stress is essentially important for the design and prediction of superalloys. Accurate prediction of creep strain necessitates comprehensive feature data. In this work, a phase-field (PF)-informed machine learning (ML) is developed to investigate the creep strain of Ni-12.2Al-6Co-2.5Ta (at.%) superalloy. The creep damage crystal plasticity PF model is employed to simulate the creep morphology, composition and strain evolution. A ML-based quantitative prediction model for creep strain is established to assess the impact of composition and microstructure on creep behavior. Moreover, to enhance the accuracy and generalization of the ML model, statistical features are added using two-point analysis and principal component analysis (PCA) methods for characterizing the two-phase morphology. Additionally, the Shapley Additive Explanations algorithm is used to explain the intrinsic relationships of γ’ rafting, γ’ volume fraction, and creep strain. The phase classification model has an accuracy rate of over 99.2%; the mean square error of the quantitative creep strain prediction model is reduced from 0.304 to 0.235 by using two-point analysis and PCA dimensionality reduction. This study demonstrates the effectiveness of integrating PF information-driven ML in developing image recognition and creep performance prediction models for superalloys.

  • Research Article
    Liangxian Zhang, Ke Zhao, Xu Zhang, Jinling Liu

    The heat-resistant aluminum matrix composite (AMC) exhibits excellent thermal performance due to the presence of heat-resistant dispersed nano-phases. Accurately characterizing high-temperature flow stress is essential for comprehending the mechanisms of deformation and improving material workability. To enhance the accuracy of modeling the flow stress for a new heat-resistant AMC during high-temperature processing, a set of isothermal compression tests at elevated temperatures was conducted. This testing was performed on the composite under varying temperature levels (473, 523, 573, 623, and 673 K) and distinct strain rates (0.001, 0.01, and 0.1 s-1). To accurately characterize the flow stress of the composite material at high temperatures, three distinct models were devised: (1) an Arrhenius model that includes strain compensation; (2) a back-propagation neural network (BPNN) model; and (3) a BPNN model optimized using a genetic algorithm (GA-BPNN). The strain compensation theory enhances the Arrhenius model’s ability to capture nonlinear characteristics, while the genetic algorithm (GA) optimizes the BPNN model’s parameter settings. The accuracy of each model in describing flow stress was compared to determine their effectiveness. The findings demonstrate that the GA-BPNN model achieved superior fitting accuracy, with a root mean square error (RMSE) of 6.48, accompanied by a coefficient of determination (R2) of 0.991 and a mean absolute error (MAE) of 5.4. To evaluate the generalization capabilities of the three models, a new data set was utilized for verification. The generalization capabilities of the three models were verified using a set of new data. The GA-BPNN model demonstrates outstanding generalization capability, achieving the highest prediction accuracy for new datasets, with R2 = 0.9102, RMSE = 9.09, and MAE = 7.83. Using the GA-BPNN model’s fitting results, a hot processing map was developed, and the optimal processing window (573 to 673 K) was identified. This study serves as a valuable reference for optimizing the processing parameters of heat-resistant AMCs and proposes a novel approach combining strain compensation and machine learning for high-temperature flow stress description. While the current framework demonstrates computational robustness, extending conclusions to composites with significantly different compositions requires further validation.

  • Research Article
    Haoming Zhang, Mo Cheng, Xuanyu Jiang, Hui Zhang, Xiaodong Pi, Deren Yang, Tianqi Deng

    Silicon carbide (SiC) is a representative high-thermal-conductivity semiconductor for applications in power electronics and quantum devices. In these applications, thermal management becomes critical for stable functioning. Meanwhile, SiC crystallizes into hundreds of polytypes with various stacking orders and small energy difference. Such a characteristic also leads to the formation of stacking faults. To understand the thermal transport property of SiC in the presence of polytypism and crystallographic defects, we have developed a neuroevolution potential for SiC. The model is trained on a diverse dataset with stoichiometric and off-stoichiometric SixCy configurations. The dataset is strategically sampled using graph network encoding and clustering in the feature space. The potential achieves high accuracy with energy error of 4.1 meV/atom and force error of 0.22 eV/Å in the test set. And the dataset diversity is found to be critical to the model robustness. The application of this model is demonstrated by simulating the thermal transport property of SiC polytypes and stacking faults using homogeneous nonequilibrium molecular dynamics. The thermal resistance of stacking faults can reach substantial values up to 10-10 K·m2·W-1, which may create hot spots and impede heat dissipation in devices. The thermal resistance varies considerably among different fault types, with those exhibiting consecutive cubic k stacking configurations demonstrating remarkably lower resistance. This work provides an accurate and efficient machine-learning interatomic potential for simulations of thermal and phonon properties of SiC and understanding of its thermal management in the presence of polytypism and crystallographic defects.

  • Research Article
    Weibin Ma, Ling Li, Yu Zhang, Minjie Li, Na Song, Peng Ding

    It is of significant importance to design flame-retardant polymeric composites (FRPCs) with superior flame retardancy and appropriate mechanical properties. However, discovering such materials is often reliant on serendipity, as the conventional “trial-and-error” approach is inadequate for navigating the vast virtual space. To overcome this challenge, we propose an active generative design framework to accelerate the development of FRPCs within the expansive virtual space. This framework operates as a closed-loop system, integrating machine learning, knowledge-embedded generative model, and experimental exploration. Through this approach, we derived two interpretable linear expressions and identified a key composition threshold that when the mass fraction of zinc stannate is below 2.5% and that of piperazine pyrophosphate exceeds 12.5%, the flame retardancy of polypropylene (PP)-based FRPCs is significantly enhanced. By processing and characterizing 10 FRPCs, we successfully designed two composites with flame retardancy improved by 1% compared to the top-performing reference FRPC in the initial dataset - without compromising mechanical properties. This work effectively resolves the trade-off between flame retardancy and mechanical performance at a low cost, demonstrating a promising pathway for the accelerated discovery of PP-based FRPCs with balanced properties.

  • Review
    Yu Shu, Naihua Miao, Rize Li, Yucheng Lin, Siyu Han, Jian Zhou, Zhimei Sun

    The development of advanced optoelectronic materials constitutes a pivotal frontier in modern energy and communication technologies, facilitating critical energy-photon-electron interconversion processes that underpin sustainable energy infrastructures and high-performance electronic devices. However, the discovery and optimization of novel optoelectronic materials face substantial hurdles arising from complicated structure-property interdependencies, prohibitive development costs, and protracted innovation cycles. Conventional empirical approaches and computational simulations usually exhibit limited efficacy in addressing the escalating demands for materials with superior stability, economic viability, and customizable electronic properties. The integration of machine learning (ML) with high-throughput screening has emerged as a transformative strategy to address these challenges. By rapidly processing large multidimensional datasets and predicting critical material properties such as electronic structure, thermodynamic stability, and charge transport behaviors, ML offers unprecedented capabilities in the efficient and rational design of high-performance optoelectronic materials. This review provides a comprehensive overview of cutting-edge ML-driven methodologies in efficient optoelectronic materials discovery with emphasis on critical workflows, data integration strategies, and model frameworks. We also discuss the challenges and prospects for ML applications, particularly in data standardization, model interpretability and closed-loop experimental validation. We further propose the potential of artificial intelligence and autonomous laboratories to build a powerful discovery pipeline to advance the development of high-performance optoelectronic materials.

  • Research Article
    Jin Li, Jie Ma, Jie Wu, Wentao He, Qian Xiang, Jian-Min Ma, Mingjun Hu

    Significant efforts have been made to investigate the relationship between generic fractions and bulk properties of asphalt, as the most used binder in road pavement engineering. However, due to limited data availability, advanced data mining techniques, such as machine learning (ML), have rarely been applied in the field. This study aimed to collect extensive data on asphalt generic fractions and bulk properties and to explore their underlying linkage using ML methods. A total of 800 datasets for asphalt fractions of the saturate, aromatic, resin, and asphaltene (SARA) were collected and analyzed across various asphalt types. The generic fractions and derived indices were used as input variables in ML models to predict key asphalt properties, including penetration, softening point, rutting factor, and rotational viscosity. The contribution of different generic fractions, derived indices, and additional variables (e.g., asphalt type and geographical origin) to these properties was quantified using the SHapley Additive exPlanations (SHAP) technique. Among the ML models evaluated, adaptive boosting (AdaBoost) showed the best predictive performance, while the support vector machine demonstrated greater robustness. SHAP analysis revealed that penetration was primarily influenced by the proportions of asphaltenes and saturates, while asphaltene content and the asphaltenes index were the most significant predictors for other properties, such as softening point, rutting factor, and rotational viscosity. Including asphalt type and geographical origin as categorical variables in the models further improved prediction accuracy. This study highlights the potential of ML techniques in uncovering complex relationships between asphalt fractions and their bulk properties, surpassing conventional statistical approaches, though challenges remain.

  • Research Article
    Yuchao Tang, Bin Xiao, Manabu Ihara, Sergei Manzhos, Yi Liu

    Prediction of materials properties from descriptors of chemical composition and structure with machine learning (ML) methods has been emerging as a viable approach to materials design and is a major component of the materials informatics field. However, as both experimental and computed data may be costly, one often has to work with limited data, which increases the risk of overfitting. Combining various datasets to improve sampling on the one hand and designing optimal ML models from small datasets on the other, can be used to address this issue. Center-environment (CE) features were recently introduced and showed promise in predicting formation energies, structural parameters, band gaps, and adsorption properties of various materials. Here, we consider the prediction of formation energies of Nb and Nb-Nb5Si3 eutectic alloys substituted with various alloying elements in the Nb and Nb5Si3 phases using CE features - a typical alloy system where the data can be naturally divided into subsets based on the types of substitutional sites. We explore effects of dataset combination and of the functional form of the dependence of the target property on the features. We show that combining the subsets, despite the increased amount of data, can complicate rather than facilitate ML, as different subsets do not increase the density of sampling but sample different parts of space with different distribution patterns, and also have different optimal hyperparameters. The Gaussian process regression-neural network hybrid ML method was used to separate the effects of nonlinearity and inter-feature coupling and show that while for Nb alloys nonlinearity is unimportant, it is critical to Nb-Nb5Si3 alloys. We find that inter-feature coupling terms are unimportant or non-recoverable, demonstrating the utility of more robust and interpretable additive models.

  • Research Article
    Yuze Liu, Lejia Wang, Weigang Zhu, Xi Yu

    Machine learning (ML) model development in chemistry and materials science often grapples with the challenge of small and imbalanced labeled datasets, a common limitation in experimental studies. These dataset imbalances can precipitate overfitting and diminish model generalization. Our study explores the efficacy of the farthest point sampling (FPS) strategy within targeted chemical feature spaces, demonstrating its capacity to generate well-distributed training sets and consequently enhance model performance. We rigorously evaluate this strategy across various ML models, including artificial neural networks, support vector machines, and random forests, using datasets with target physicochemical properties such as standard boiling points and enthalpy of vaporization. Our findings reveal that FPS-based models consistently surpass randomly sampled models, exhibiting superior predictive accuracy and robustness, alongside a marked reduction in overfitting. This improvement is particularly pronounced in smaller training set, attributable to increased diversity within the training data’s chemical feature space. Consequently, FPS emerges as an effective and adaptable approach for achieving high-performance ML models at reduced cost by limited and biased experimental datasets typical in chemistry and materials science.

  • Research Article
    Jianhua Chen, Junwei Chen, Boyu Zhao, Yunying Fan, Zhigang Yu, Jun Luan, Kuochih Chou

    Machine learning models demonstrate remarkable capabilities in predicting properties of novel material. The optimal model can theoretically be obtained through an exhaustive search of data subsets, algorithms, and hyperparameters. However, the fundamental challenge lies in identifying the most efficient pathway through this immense search space. In this paper, we address this challenge by proposing an active learning-based data screening and model retrieval framework, which can develop enhanced models based on internal data while incorporating additional external data to further improve model performance. Systematic validation studies were conducted using four datasets, comprising both classification and regression data. Superior models were obtained within 10 iterative cycles for all cases, achieving a 3.3%-10.3% improvement compared to state-of-the-art results in current literature. Among the results, the framework reduced modeling error by 10.3% for AlCoCrCuFeNi hardness internal data and achieved a more significant error reduction of 42.6% through the integration of additional external hardness data. The framework achieves an ideal balance between computational efficiency and predictive accuracy while enabling deeper data exploration, with its low-code implementation and user-friendly characteristics making it a promising tool for materials design.

  • Review
    Yuelin Wang, Chengquan Zhong, Jingzi Zhang, Jiakai Liu, Kailong Hu, Junjie Chen, Xi Lin

    Thermoelectric materials enabling direct interconversion between thermal and electrical energy hold transformative potential for sustainable energy technologies, particularly in solid-state power generation and precision refrigeration systems. The pursuit of high-performance thermoelectric materials with exceptional energy conversion efficiency has remained a persistent challenge in materials science, primarily constrained by the resource-intensive nature of traditional experimental approaches and computationally demanding first-principles simulations. The emergence of machine learning (ML) techniques has revolutionized this field by enabling rapid screening of material candidates and establishing quantitative structure-property relationships. This comprehensive review systematically examines cutting-edge methodologies in ML-driven thermoelectric materials research, with particular emphasis on three pivotal aspects: (1) predictive modeling of key performance parameters including electrical conductivity, Seebeck coefficient, and lattice thermal conductivity through advanced feature engineering and algorithm selection; (2) inverse design strategies for optimizing carrier concentration and phonon scattering mechanisms; (3) application-specific material optimization frameworks integrating multi-objective constraints. Furthermore, we critically analyze prevailing challenges in data quality, model interpretability, and cross-scale prediction accuracy, while proposing future research directions encompassing active learning paradigms, generative adversarial networks for virtual material synthesis, and hybrid physics-informed ML architectures.