Journal of Materials Informatics

2025-11-25 2025, Volume 5 Issue 4

Previous Next

Select all

Research Article

Accelerating phase-field simulation of coupled microstructural evolution using autoencoder-based recurrent neural networks

Aidan Gesch, Chongze Hu

2025, 5(4): 42. https://doi.org/10.20517/jmi.2025.23

Accelerated phase-field frameworks leveraging time-dependent neural networks have recently been developed to accelerate microstructure-based phase-field simulations in both temporal and spatial domains. However, most of these frameworks have been designed for phase-field problems involving a single variable field, such as spinodal decomposition. In this study, we developed an accelerated framework for predicting the microstructural evolution of Ostwald ripening, a classical phase-field problem involving multiple interdependent parameter fields. This framework integrates various components: high-throughput phase-field simulations for generating high-quality microstructure database, autoencoder-based dimensionality reduction to transform 2D microstructure images into latent representations, and long short-term memory (LSTM) networks serving as the microstructure learning engine. Our results demonstrate that autoencoder techniques can effectively reduce the large dimension of microstructure images into 16 key values, while maintaining high accuracy in reconstructing these reduced representations back to their original space. Using these latent representations, LSTM models are employed to capture the key microstructural features of Ostwald ripening and predict their evolution over future time sequences, with a speedup of approximately 3.35 × 10⁵ times compared to the high-fidelity phase-field simulations. The accelerated framework presented in this work is the first data-driven emulation specifically designed for coupled phase-field problems, and it can be easily extended to predict other evolutionary phenomena with more complex microstructural features.

Review

A critical review of machine learning interatomic potentials and Hamiltonian

Yifan Li, Xiuying Zhang, Mingkang Liu, Lei Shen

2025, 5(4): 43. https://doi.org/10.20517/jmi.2025.17

Machine learning interatomic potentials (ML-IAPs) and machine learning Hamiltonian (ML-Ham) have revolutionized atomistic and electronic structure simulations by offering near ab initio accuracy across extended time and length scales. In this Review, we summarize recent progress in these two fields, with emphasis on algorithmic and architectural innovations, geometric equivariance, data efficiency strategies, model-data co-design, and interpretable AI techniques. In addition, we discuss key challenges, including data fidelity, model generalizability, computational scalability, and explainability. Finally, we outline promising future directions, such as active learning, multi-fidelity frameworks, scalable message-passing architectures, and methods for enhancing interpretability, which is particularly crucial for the field of AI for Science (AI4S). The integration of these advances is expected to accelerate materials discovery and provide deeper mechanistic insights into complex material and physical systems.

Research Article

Application of self-supervised learning in steel surface defect detection

Shiyu Hu, Xudong Ma, Yuqi Zhang, Wei Xu

2025, 5(4): 44. https://doi.org/10.20517/jmi.2025.21

In scientific research, effective utilization of unlabeled data has become pivotal, as exemplified by AlphaFold2, which won the 2024 Nobel Prize. Pioneering this paradigm shift, we develop a universal self-supervised learning methodology for detecting surface defects in steel materials. By harnessing unlabeled data, our approach significantly reduces the dependence for manual annotation and enhances scalability while training robust models capable of generalizing across defect types. Using a Faster R-CNN framework, we achieved a mean average precision (mAP) of 0.385 and a mAP at IoU = 0.5 (mAP_50) of 0.768 on the NEU-DET steel defects dataset. These results demonstrate both the efficacy of our self-supervised strategy and its potential as a framework for developing image detection systems with minimal labeled data requirements in surface defect identification.

Research Article

Data-driven OLED candidate design: a generative model from independent-property domains to the comprehensive performance enhancement

Xinxin Niu, Zhiyao Su, Luyu Wang, Wenbin Shi, Hengyue Zhang, Yanfeng Dang, Yuan Yuan, Yajing Sun, Wenping Hu

2025, 5(4): 45. https://doi.org/10.20517/jmi.2025.22

The discovery of high-performance organic light-emitting diode (OLED) materials is hindered by conventional human-aware design methodologies and the scarcity of pure organic luminescent scaffolds. Although machine learning models have improved the efficiency of high-throughput screening for OLED candidates, their effectiveness is still limited by the small size and low quality of available experimental datasets. In this study, we introduced LumiGen, an integrated framework for the de novo design of high-quality OLED candidate molecules with targeted photophysical properties. A sampling-screening iterative process was designed to gradually refine the molecular selection, enabling the transition from independent-property optimization to all-rounded OLED candidates. Among the collected high-quality OLED candidate molecules, computational estimates indicate that the optical properties of most molecules (approximately 80.2%) meet the required criteria. During the iterative training process, the Sampling Augmentor enhances the proportion of OLED candidate molecules by over threefold (from 6.56% to 21.13%). Additionally, we successfully synthesized a new molecular scaffold from the OLED candidates, achieving a photoluminescence quantum yield of up to 88.6%. According to the statistics, only 0.33% of the molecules in the dataset outperform our synthesized molecules in terms of overall optical performance. LumiGen demonstrates the ability to learn molecular distribution patterns from disjoint labeled datasets, enabling the direct generation of all-round OLED candidates, thereby advancing OLED material discovery.

Perspective

Heterogeneous catalyst design by generative models

Chao Yang, Lulu Wang, Jinbo Zhu, Pengfei Ou

2025, 5(4): 46. https://doi.org/10.20517/jmi.2025.38

Understanding interactions between reactive species and surfaces remains a fundamental challenge in materials science and heterogeneous catalysis. Central to this challenge is the efficient and accurate generation of realistic surfaces and intermediate structures. Despite growing efforts, a universal and systematic approach to surface structure generation is still lacking, particularly for complex interfaces. Existing automated protocols often require extensive computations to identify stable configurations. Recent advances in dataset availability and machine learning techniques, especially in generative models, are beginning to show promise for tasks such as catalyst structure generation. In this perspective, we highlight the emerging capabilities of generative models in catalytic research and outline future directions for their applications. These include property-guided surface structure generation, efficient sampling of adsorption geometries, and the generation of complex transition-state structures. We aim to provide catalysis researchers with a clear view of current progress, outline key challenges, and identify opportunities for integrating generative models into the design and discovery of heterogeneous catalysts.

Review

Machine learning-driven design and optimization of electronic packaging: applications and future developments

Xiangyu Chen, Sirui He, Kyung-Wook Paik, Yew-Hoong Wong, Shuye Zhang

2025, 5(4): 47. https://doi.org/10.20517/jmi.2025.26

Machine learning (ML) provides robust solutions for electronic packaging, where growing complexity and miniaturization challenge traditional methods in design, defect detection, and performance optimization. This review systematically covers ML applications across key areas in electronic packaging, such as defect detection, material optimization, and reliability analysis, discussing key algorithms, data workflows, inherent challenges, and prospects. It aims to provide a clear roadmap and reference for effectively applying ML to innovate in this rapidly evolving field. However, addressing persistent challenges in data quality, model adaptability, and integration with established engineering practices remains vital for continued progress in this domain.

Research Article

FIND: a forward–inverse navigation and discovery platform for hydrogen storage alloys powered by data-driven machine learning

Xuao Lu, Shiwen Luo, Jiongyang Li, Minjie Chen, Tongao Yao, Zhuoran Xu, Yujie Yan, Jun Li, Xuqiang Shao, Zhengyang Gao, Weijie Yang

2025, 5(4): 48. https://doi.org/10.20517/jmi.2025.56

High-performance solid-state hydrogen storage alloys are among the key factors enabling the widespread application of hydrogen energy. However, current materials still face challenges such as limited hydrogen storage capacity and excessive thermodynamic stability, which urgently need to be addressed. In this work, we constructed a large-scale solid-state hydride database, encompassing over 1,000 alloy systems and more than 6,000 valid data records. By integrating alloying strategies with machine learning (ML) techniques, the Magpie tool was utilized for feature generation, and a multi-objective regression model was developed to simultaneously predict absorption/desorption plateau pressure, enthalpy change, entropy change, and maximum hydrogen storage capacity using various ML algorithms. Furthermore, we achieved the inverse design of solid-state hydrogen storage materials using a variational autoencoder. By integrating the forward prediction and inverse design models, we developed a forward–inverse navigation and discovery platform for hydrogen storage alloys powered by data-driven ML: FIND. The forward module enables rapid prediction of absorption and desorption properties based on alloy composition and testing temperature. Building upon this, an advanced function allows fast prediction for multicomponent systems with flexible molar ratios. Subsequently, the inverse module facilitates the screening of potential alloy candidates based on user-defined target properties. Finally, the predictive models were integrated with a genetic algorithm to optimize alloy compositions within the Mg–Ni–La–Ce and Mg–Ni–La systems. Multiple novel high-performance alloy candidates were identified, providing a powerful tool and methodological foundation for high-throughput screening and intelligent development of hydrogen storage materials.

Research Article

Atomistic underpinnings for dislocation emission behaviors at the crack tips in FCC metals in light of thermo-kinetic synergy

Kunyu Zhang, Jinglian Du, Jianwei Xiao, Ziding Yang, Jiaqi Yang, Kexing Song, Feng Liu

2025, 5(4): 49. https://doi.org/10.20517/jmi.2025.60

Dislocation emission at the crack tips in faced-centered cubic (FCC) metals generally involves multiple mechanisms of dislocation nucleation dominated by the motion behaviors of atoms near the crack tips. Taking FCC aluminum as a representative, in this work, the emission behaviors and nucleation mechanism of dislocations at the crack tips in FCC metals are investigated in light of the continuum mechanics models within the framework of the anisotropic linear elastic fracture mechanics. The system energy evolution and motion behaviors of atoms near the crack tips under Mode I loading conditions are obtained first by performing molecular dynamics simulations. The underlying thermo-kinetic origins of dislocation emission are clarified based on driving force U and energy barrier Q_k for dislocation emission at the crack tips acquired from system energy variations. As the applied load rises, the value of U increases while that of Q_k decreases, accelerating the dislocation emission process. The magnitude of Q_k is proportional to the dislocation nucleation energy Q, which depends on the extrema of the generalized stacking fault (SF) energy curve along the dislocation emission direction (including the unstable SF energy γ_usf, intrinsic SF energy γ_isf, and unstable twinning fault energy γ_utf). When abnormal fluctuations appear in the system energy evolution curve, the energy barrier Q_k for dislocation emission undergoes a sudden change, signifying the transition of the dislocation nucleation mechanism. Accordingly, a thermo-kinetic criterion for the mechanism transition of dislocation nucleation at the crack tips in FCC metals is proposed. Our investigation provides an innovative thermo-kinetic perspective to understand the dislocation emission behaviors and the critical conditions for maintaining stability at the crack tips of FCC metals.

Conference Report

Unlocking the future of materials science: key insights from the DCTMD workshop

Rika Kobayashi, Roger D. Amos, T. Daniel Crawford, Hongxia Hao, Yi Liu, Turab Lookman, Rampi Ramprasad, Matthias Scheffler, Hong Wang, Tong-Yi Zhang

2025, 5(4): 50. https://doi.org/10.20517/jmi.2025.44

The International Workshop on Data-Driven Computational and Theoretical Materials Design was held between October 9-13, 2024, in Shanghai, gathering leading scientists and researchers from around the world, representing various aspects of data-driven AI methodologies and applications in materials design. The topics covered over 46 talks and 29 posters spanned a wide range of the latest advancements, including Machine Learning for Materials Design, Method Development, Machine Learning Interatomic Potentials, Advanced Computing, Infrastructure and Standards, Large Language Models, and Autonomous Labs. As part of the workshop, a panel discussion titled “Unlocking the AI Future of Materials Science” was held to disseminate the state-of-the-art of AI/ML in materials science and consider directions for the future. This report is a synthesis, for this Special Issue, of the panel discussion - drawing on insights gained from the workshop as a whole and surrounding conversations, in particular, the question of what constitutes success.

Research Article

MatSci-ML Studio: an interactive workflow toolkit for automated machine learning in materials science

Yu Wang, Fei Wang, Guangmao Yan, Jun Wang, Guodong Niu, Jing Feng, Jian Mao, Yan Zhao

2025, 5(4): 51. https://doi.org/10.20517/jmi.2025.45

Machine learning (ML) has become a cornerstone of modern materials science, offering powerful tools for predicting material properties and accelerating experimental workflows. However, its widespread adoption is often hindered by the steep learning curve associated with programming languages such as Python, which presents a significant technical barrier for many domain experts. To address this challenge, we introduce MatSci-ML Studio: an interactive and user-friendly software toolkit designed to empower materials scientists with limited coding expertise. In contrast to traditional code-based frameworks, MatSci-ML Studio features an intuitive graphical user interface that encapsulates a comprehensive, end-to-end ML workflow. This integrated platform seamlessly guides users through data management, advanced preprocessing, multi-strategy feature selection, automated hyperparameter optimization, and model training, democratizing advanced computational analysis for the materials community. Notably, it incorporates advanced capabilities such as a SHapley Additive exPlanations-based interpretability analysis module for explaining model predictions and a multi-objective optimization engine for exploring complex design spaces. The practicality and effectiveness of MatSci-ML Studio are demonstrated through representative case studies, confirming its capacity to lower the technical barrier for ML applications, foster innovation, and significantly enhance the efficiency of data-driven materials science.

Research Article

Stacked machine learning for accurate and interpretable prediction of MXenes’ work function

Lijun Shang, Yongli Yang, Yadong Yu, Pan Xiang, Li Ma, Zhonglu Guo, Mengyan Dai

2025, 5(4): 52. https://doi.org/10.20517/jmi.2025.36

MXenes, with tunable compositions and rich surface chemistry, enable precise control of electronic, optical, and mechanical properties, making them promising materials in electronics and energy-related applications. In particular, the work function plays a critical role in determining their physicochemical properties. However, the accurate prediction of the work function of MXenes with machine learning (ML) remains challenging due to the lack of robust models with high accuracy and interpretability. To this end, we propose a stacked model and introduce high-quality descriptors constructed via Sure Independence Screening and Sparsifying Operator method to improve the prediction accuracy of the work function of MXenes in this work. The stacked model initially generates predictions from multiple base models, and then employs these predictions as inputs to a meta-model for secondary learning, thereby enhancing both predictive performance and generalization capability. The results show that by integrating the high-quality descriptors, the model’s performance improves significantly, yielding a coefficient of determination of 0.95 and mean absolute error of 0.2, respectively. Last but not least, we demonstrate that MXenes’ work functions are predominantly governed by their surface functional groups, where SHapley Additive exPlanations value analysis quantitatively resolves the structure–property relationship between surface functional groups and the work function of MXenes. Specifically, O terminations can lead to the highest work functions, while OH terminations result in the lowest value (over 50% reduction), and transition metals or C/N elements have a relatively smaller effect. This work achieves an optimal balance between accuracy and interpretability in ML predictions of MXenes’ work functions, providing both fundamental insights and practical tools for materials discovery.

Research Article

RF-NSGA-II framework for inverse design of high-performance Mg-Gd-based magnesium alloys

Yunchuan Cheng, Lei Wang, Zhihua Dong, Zengyong Zheng, Zhihong Xia, Shengwen Bai, Jiangfeng Song, Bin Jiang

2025, 5(4): 53. https://doi.org/10.20517/jmi.2025.61

An inverse design framework, RF-NSGA-II, is developed using machine learning (ML) methods and a multi-objective co-optimization strategy. It enables intelligent design of chemical compositions and processing parameters for thermo-mechanical treatments based on desired mechanical properties of Mg alloys. Using a database of extruded Mg-Gd and Mg-Y-based alloys, RF-NSGA-II integrates an optimized forward model with a non-dominated sorting genetic algorithm II (NSGA-II). The forward model is constructed by evaluating the performance of different ML algorithms, with the random forest (RF) algorithm experimentally validated to accurately describe the relationship between chemical composition and mechanical properties. RF-NSGA-II simultaneously optimizes multiple mechanical properties, and validation through experimental measurements demonstrates its effectiveness. Using target mechanical properties as inputs, chemical compositions and processing parameters for solid-solution treatment and extrusion are efficiently determined for a high-strength Mg-11.5Gd-6.0Y-1.0Zn-0.2Mn (wt.%) alloy and a high-ductility Mg-2.5Gd-1.0Zn (wt.%) alloy, achieving tensile strength/elongation values of 417 MPa/3.2% and 223 MPa/34%, respectively. These results provide a transparent and effective route for the inverse design of advanced Mg alloys based on desired mechanical properties.

Perspective

Exploring materials data through collaboration: 2024 KRICT ChemDX Hackathon

Su-Hyun Yoo, Andre K. Y. Low, Jose Recatala-Gomez, Harikrishna Sahu, Chiho Kim, Joonyoung F. Joung, Hoje Chun, Katerina A. Christofidou, Joshua Berry, Michail Minotakis, Kisung Kang, Kwang-soo Kim, Gaheun Shin, Hyunwoo Jang, Sanghyuk Lee, Minkyu Park, Byung-Hyun Kim, Kihyun Shin, Jungho Shin, Aloysius Soon, Joshua Schrier, Woosun Jang

2025, 5(4): 54. https://doi.org/10.20517/jmi.2025.65

Data-driven research is in the spotlight across many science and engineering fields, including materials science, with the expectation that effective utilization of data, supported by modern artificial intelligence techniques, can lead to breakthroughs in addressing key scientific questions. Korea Research Institute of Chemical Technology (KRICT) Chemical Data Explorer platform (ChemDX), our web-based and integrated platform, including various data explorer and artificial intelligence modules, aims to enhance accessibility of chemical data for digital materials discovery. In this article, we highlight the results of the 2024 KRICT ChemDX Hackathon, an event to support data-driven research in chemistry and materials science. Hackathon participants explored ChemDX platform and developed projects ranging from machine learning models and data visualization tools to user interface improvements. These projects demonstrated the versatility and potential of data-driven research with the aid of ChemDX platform, in bridging data-driven experimental and computational research. The feedback and outcomes from this hackathon demonstrate the impressive potential of interdisciplinary data-driven research, guide further improvements to the platform, and enhance its usability and outreach.

Research Article

Deep learning-enhanced cellular automaton framework for modeling static recrystallization behavior

Yulong Zhu, Yu Cao, Boyang Xu, Jieke Zhang, Qubo He, Rui Luo, Xuhong Jia, Quanyi Liu, Ziyong Hou

2025, 5(4): 55. https://doi.org/10.20517/jmi.2025.48

Accurately characterizing dislocation behavior - the driving force behind nucleation and growth of recrystallized grains - has long been a formidable challenge for traditional cellular automata methods. We are proud to unveil a groundbreaking, machine learning-enhanced cellular automaton framework that fundamentally transforms the mapping of dislocation substructures while expertly modeling static recrystallization (SRX) behavior. By integrating the dislocation escape assumption during recovery, we decisively eliminate spatial resolution limitations and capture the intricate mesoscopic dynamics of dislocation evolution with unprecedented precision. Our innovative approach has demonstrated extraordinary effectiveness in predicting the recrystallization kinetics of a typical austenitic alloy, bolstered by strong experimental validation. The deep learning-based dislocation implantation module, SRX-net, stands out as a game-changer, surpassing traditional techniques such as random forests and U-net by showcasing exceptional capabilities in identifying complex intracrystalline substructures and managing uneven strain concentrations. The proposed advanced simulations yield critical insights into dislocation density variations, highlighting significant local fluctuations driven by dislocation escape. Importantly, while the migration and accumulation of dislocations may falter in meeting nucleation conditions during grain growth, our model excels in accurately predicting average SRX grain sizes without introducing any unphysical artifacts. This revolutionary framework dramatically reduces time-to-solution, empowering comprehensive parametric studies and enabling near real-time recrystallization simulations, thus setting a bold new standard for industrial applications.

Research Article

Ultralow thermal conductivity via weak interactions in PbSe/PbTe monolayer heterostructure for thermoelectric design

Ruihao Tan, Kaiwang Zhang, Yue-Wen Fang

2025, 5(4): 56. https://doi.org/10.20517/jmi.2025.62

In this study, we systematically investigate the thermal and electronic transport properties of a two-dimensional (2D) PbSe/PbTe monolayer heterostructure by combining first-principles calculations, Boltzmann transport theory, and machine learning methods. The heterostructure exhibits a unique honeycomb-like corrugated and asymmetric configuration, which significantly enhances phonon scattering. Moreover, the relatively weak interatomic interactions in PbSe/PbTe lead to the formation of antibonding states, resulting in strong anharmonicity and ultimately yielding ultralow lattice thermal conductivity $$( {\kappa_{{\rm{L}}}} )$$. In the four-phonon scattering model, the $$ {\kappa_{{\rm{L}}}} $$ values along the x and y directions are as low as 0.37 and 0.31 W · m⁻¹ · K⁻¹, respectively. Contrary to the conventional view that long mean free path acoustic phonons dominate heat transport, we find that optical phonons contribute approximately 59 % of the $$ {\kappa_{{\rm{L}}}} $$ in this heterostructure due to their larger group velocities than the acoustic phonons. Further analysis of thermoelectric performance shows that at a high temperature of 800 K, the heterostructure achieves an exceptional dimensionless figure of merit (ZT) of 5.3 along the y direction, indicating outstanding thermoelectric conversion efficiency. These findings not only provide theoretical insights into the transport mechanisms of PbSe/PbTe monolayer heterostructure but also offer a practical design strategy for developing high-performance 2D layered thermoelectric materials.