Accelerated phase-field frameworks leveraging time-dependent neural networks have recently been developed to accelerate microstructure-based phase-field simulations in both temporal and spatial domains. However, most of these frameworks have been designed for phase-field problems involving a single variable field, such as spinodal decomposition. In this study, we developed an accelerated framework for predicting the microstructural evolution of Ostwald ripening, a classical phase-field problem involving multiple interdependent parameter fields. This framework integrates various components: high-throughput phase-field simulations for generating high-quality microstructure database, autoencoder-based dimensionality reduction to transform 2D microstructure images into latent representations, and long short-term memory (LSTM) networks serving as the microstructure learning engine. Our results demonstrate that autoencoder techniques can effectively reduce the large dimension of microstructure images into 16 key values, while maintaining high accuracy in reconstructing these reduced representations back to their original space. Using these latent representations, LSTM models are employed to capture the key microstructural features of Ostwald ripening and predict their evolution over future time sequences, with a speedup of approximately 3.35 × 105 times compared to the high-fidelity phase-field simulations. The accelerated framework presented in this work is the first data-driven emulation specifically designed for coupled phase-field problems, and it can be easily extended to predict other evolutionary phenomena with more complex microstructural features.
Machine learning interatomic potentials (ML-IAPs) and machine learning Hamiltonian (ML-Ham) have revolutionized atomistic and electronic structure simulations by offering near ab initio accuracy across extended time and length scales. In this Review, we summarize recent progress in these two fields, with emphasis on algorithmic and architectural innovations, geometric equivariance, data efficiency strategies, model-data co-design, and interpretable AI techniques. In addition, we discuss key challenges, including data fidelity, model generalizability, computational scalability, and explainability. Finally, we outline promising future directions, such as active learning, multi-fidelity frameworks, scalable message-passing architectures, and methods for enhancing interpretability, which is particularly crucial for the field of AI for Science (AI4S). The integration of these advances is expected to accelerate materials discovery and provide deeper mechanistic insights into complex material and physical systems.
In scientific research, effective utilization of unlabeled data has become pivotal, as exemplified by AlphaFold2, which won the 2024 Nobel Prize. Pioneering this paradigm shift, we develop a universal self-supervised learning methodology for detecting surface defects in steel materials. By harnessing unlabeled data, our approach significantly reduces the dependence for manual annotation and enhances scalability while training robust models capable of generalizing across defect types. Using a Faster R-CNN framework, we achieved a mean average precision (mAP) of 0.385 and a mAP at IoU = 0.5 (mAP_50) of 0.768 on the NEU-DET steel defects dataset. These results demonstrate both the efficacy of our self-supervised strategy and its potential as a framework for developing image detection systems with minimal labeled data requirements in surface defect identification.
The discovery of high-performance organic light-emitting diode (OLED) materials is hindered by conventional human-aware design methodologies and the scarcity of pure organic luminescent scaffolds. Although machine learning models have improved the efficiency of high-throughput screening for OLED candidates, their effectiveness is still limited by the small size and low quality of available experimental datasets. In this study, we introduced LumiGen, an integrated framework for the de novo design of high-quality OLED candidate molecules with targeted photophysical properties. A sampling-screening iterative process was designed to gradually refine the molecular selection, enabling the transition from independent-property optimization to all-rounded OLED candidates. Among the collected high-quality OLED candidate molecules, computational estimates indicate that the optical properties of most molecules (approximately 80.2%) meet the required criteria. During the iterative training process, the Sampling Augmentor enhances the proportion of OLED candidate molecules by over threefold (from 6.56% to 21.13%). Additionally, we successfully synthesized a new molecular scaffold from the OLED candidates, achieving a photoluminescence quantum yield of up to 88.6%. According to the statistics, only 0.33% of the molecules in the dataset outperform our synthesized molecules in terms of overall optical performance. LumiGen demonstrates the ability to learn molecular distribution patterns from disjoint labeled datasets, enabling the direct generation of all-round OLED candidates, thereby advancing OLED material discovery.
Understanding interactions between reactive species and surfaces remains a fundamental challenge in materials science and heterogeneous catalysis. Central to this challenge is the efficient and accurate generation of realistic surfaces and intermediate structures. Despite growing efforts, a universal and systematic approach to surface structure generation is still lacking, particularly for complex interfaces. Existing automated protocols often require extensive computations to identify stable configurations. Recent advances in dataset availability and machine learning techniques, especially in generative models, are beginning to show promise for tasks such as catalyst structure generation. In this perspective, we highlight the emerging capabilities of generative models in catalytic research and outline future directions for their applications. These include property-guided surface structure generation, efficient sampling of adsorption geometries, and the generation of complex transition-state structures. We aim to provide catalysis researchers with a clear view of current progress, outline key challenges, and identify opportunities for integrating generative models into the design and discovery of heterogeneous catalysts.
Machine learning (ML) provides robust solutions for electronic packaging, where growing complexity and miniaturization challenge traditional methods in design, defect detection, and performance optimization. This review systematically covers ML applications across key areas in electronic packaging, such as defect detection, material optimization, and reliability analysis, discussing key algorithms, data workflows, inherent challenges, and prospects. It aims to provide a clear roadmap and reference for effectively applying ML to innovate in this rapidly evolving field. However, addressing persistent challenges in data quality, model adaptability, and integration with established engineering practices remains vital for continued progress in this domain.
High-performance solid-state hydrogen storage alloys are among the key factors enabling the widespread application of hydrogen energy. However, current materials still face challenges such as limited hydrogen storage capacity and excessive thermodynamic stability, which urgently need to be addressed. In this work, we constructed a large-scale solid-state hydride database, encompassing over 1,000 alloy systems and more than 6,000 valid data records. By integrating alloying strategies with machine learning (ML) techniques, the Magpie tool was utilized for feature generation, and a multi-objective regression model was developed to simultaneously predict absorption/desorption plateau pressure, enthalpy change, entropy change, and maximum hydrogen storage capacity using various ML algorithms. Furthermore, we achieved the inverse design of solid-state hydrogen storage materials using a variational autoencoder. By integrating the forward prediction and inverse design models, we developed a forward–inverse navigation and discovery platform for hydrogen storage alloys powered by data-driven ML: FIND. The forward module enables rapid prediction of absorption and desorption properties based on alloy composition and testing temperature. Building upon this, an advanced function allows fast prediction for multicomponent systems with flexible molar ratios. Subsequently, the inverse module facilitates the screening of potential alloy candidates based on user-defined target properties. Finally, the predictive models were integrated with a genetic algorithm to optimize alloy compositions within the Mg–Ni–La–Ce and Mg–Ni–La systems. Multiple novel high-performance alloy candidates were identified, providing a powerful tool and methodological foundation for high-throughput screening and intelligent development of hydrogen storage materials.
Dislocation emission at the crack tips in faced-centered cubic (FCC) metals generally involves multiple mechanisms of dislocation nucleation dominated by the motion behaviors of atoms near the crack tips. Taking FCC aluminum as a representative, in this work, the emission behaviors and nucleation mechanism of dislocations at the crack tips in FCC metals are investigated in light of the continuum mechanics models within the framework of the anisotropic linear elastic fracture mechanics. The system energy evolution and motion behaviors of atoms near the crack tips under Mode I loading conditions are obtained first by performing molecular dynamics simulations. The underlying thermo-kinetic origins of dislocation emission are clarified based on driving force U and energy barrier
The International Workshop on Data-Driven Computational and Theoretical Materials Design was held between October 9-13, 2024, in Shanghai, gathering leading scientists and researchers from around the world, representing various aspects of data-driven AI methodologies and applications in materials design. The topics covered over 46 talks and 29 posters spanned a wide range of the latest advancements, including Machine Learning for Materials Design, Method Development, Machine Learning Interatomic Potentials, Advanced Computing, Infrastructure and Standards, Large Language Models, and Autonomous Labs. As part of the workshop, a panel discussion titled “Unlocking the AI Future of Materials Science” was held to disseminate the state-of-the-art of AI/ML in materials science and consider directions for the future. This report is a synthesis, for this Special Issue, of the panel discussion - drawing on insights gained from the workshop as a whole and surrounding conversations, in particular, the question of what constitutes success.
Machine learning (ML) has become a cornerstone of modern materials science, offering powerful tools for predicting material properties and accelerating experimental workflows. However, its widespread adoption is often hindered by the steep learning curve associated with programming languages such as Python, which presents a significant technical barrier for many domain experts. To address this challenge, we introduce MatSci-ML Studio: an interactive and user-friendly software toolkit designed to empower materials scientists with limited coding expertise. In contrast to traditional code-based frameworks, MatSci-ML Studio features an intuitive graphical user interface that encapsulates a comprehensive, end-to-end ML workflow. This integrated platform seamlessly guides users through data management, advanced preprocessing, multi-strategy feature selection, automated hyperparameter optimization, and model training, democratizing advanced computational analysis for the materials community. Notably, it incorporates advanced capabilities such as a SHapley Additive exPlanations-based interpretability analysis module for explaining model predictions and a multi-objective optimization engine for exploring complex design spaces. The practicality and effectiveness of MatSci-ML Studio are demonstrated through representative case studies, confirming its capacity to lower the technical barrier for ML applications, foster innovation, and significantly enhance the efficiency of data-driven materials science.
MXenes, with tunable compositions and rich surface chemistry, enable precise control of electronic, optical, and mechanical properties, making them promising materials in electronics and energy-related applications. In particular, the work function plays a critical role in determining their physicochemical properties. However, the accurate prediction of the work function of MXenes with machine learning (ML) remains challenging due to the lack of robust models with high accuracy and interpretability. To this end, we propose a stacked model and introduce high-quality descriptors constructed via Sure Independence Screening and Sparsifying Operator method to improve the prediction accuracy of the work function of MXenes in this work. The stacked model initially generates predictions from multiple base models, and then employs these predictions as inputs to a meta-model for secondary learning, thereby enhancing both predictive performance and generalization capability. The results show that by integrating the high-quality descriptors, the model’s performance improves significantly, yielding a coefficient of determination of 0.95 and mean absolute error of 0.2, respectively. Last but not least, we demonstrate that MXenes’ work functions are predominantly governed by their surface functional groups, where SHapley Additive exPlanations value analysis quantitatively resolves the structure–property relationship between surface functional groups and the work function of MXenes. Specifically, O terminations can lead to the highest work functions, while OH terminations result in the lowest value (over 50% reduction), and transition metals or C/N elements have a relatively smaller effect. This work achieves an optimal balance between accuracy and interpretability in ML predictions of MXenes’ work functions, providing both fundamental insights and practical tools for materials discovery.
An inverse design framework, RF-NSGA-II, is developed using machine learning (ML) methods and a multi-objective co-optimization strategy. It enables intelligent design of chemical compositions and processing parameters for thermo-mechanical treatments based on desired mechanical properties of Mg alloys. Using a database of extruded Mg-Gd and Mg-Y-based alloys, RF-NSGA-II integrates an optimized forward model with a non-dominated sorting genetic algorithm II (NSGA-II). The forward model is constructed by evaluating the performance of different ML algorithms, with the random forest (RF) algorithm experimentally validated to accurately describe the relationship between chemical composition and mechanical properties. RF-NSGA-II simultaneously optimizes multiple mechanical properties, and validation through experimental measurements demonstrates its effectiveness. Using target mechanical properties as inputs, chemical compositions and processing parameters for solid-solution treatment and extrusion are efficiently determined for a high-strength Mg-11.5Gd-6.0Y-1.0Zn-0.2Mn (wt.%) alloy and a high-ductility Mg-2.5Gd-1.0Zn (wt.%) alloy, achieving tensile strength/elongation values of 417 MPa/3.2% and 223 MPa/34%, respectively. These results provide a transparent and effective route for the inverse design of advanced Mg alloys based on desired mechanical properties.
Data-driven research is in the spotlight across many science and engineering fields, including materials science, with the expectation that effective utilization of data, supported by modern artificial intelligence techniques, can lead to breakthroughs in addressing key scientific questions. Korea Research Institute of Chemical Technology (KRICT) Chemical Data Explorer platform (ChemDX), our web-based and integrated platform, including various data explorer and artificial intelligence modules, aims to enhance accessibility of chemical data for digital materials discovery. In this article, we highlight the results of the 2024 KRICT ChemDX Hackathon, an event to support data-driven research in chemistry and materials science. Hackathon participants explored ChemDX platform and developed projects ranging from machine learning models and data visualization tools to user interface improvements. These projects demonstrated the versatility and potential of data-driven research with the aid of ChemDX platform, in bridging data-driven experimental and computational research. The feedback and outcomes from this hackathon demonstrate the impressive potential of interdisciplinary data-driven research, guide further improvements to the platform, and enhance its usability and outreach.
Accurately characterizing dislocation behavior - the driving force behind nucleation and growth of recrystallized grains - has long been a formidable challenge for traditional cellular automata methods. We are proud to unveil a groundbreaking, machine learning-enhanced cellular automaton framework that fundamentally transforms the mapping of dislocation substructures while expertly modeling static recrystallization (SRX) behavior. By integrating the dislocation escape assumption during recovery, we decisively eliminate spatial resolution limitations and capture the intricate mesoscopic dynamics of dislocation evolution with unprecedented precision. Our innovative approach has demonstrated extraordinary effectiveness in predicting the recrystallization kinetics of a typical austenitic alloy, bolstered by strong experimental validation. The deep learning-based dislocation implantation module, SRX-net, stands out as a game-changer, surpassing traditional techniques such as random forests and U-net by showcasing exceptional capabilities in identifying complex intracrystalline substructures and managing uneven strain concentrations. The proposed advanced simulations yield critical insights into dislocation density variations, highlighting significant local fluctuations driven by dislocation escape. Importantly, while the migration and accumulation of dislocations may falter in meeting nucleation conditions during grain growth, our model excels in accurately predicting average SRX grain sizes without introducing any unphysical artifacts. This revolutionary framework dramatically reduces time-to-solution, empowering comprehensive parametric studies and enabling near real-time recrystallization simulations, thus setting a bold new standard for industrial applications.
In this study, we systematically investigate the thermal and electronic transport properties of a two-dimensional (2D) PbSe/PbTe monolayer heterostructure by combining first-principles calculations, Boltzmann transport theory, and machine learning methods. The heterostructure exhibits a unique honeycomb-like corrugated and asymmetric configuration, which significantly enhances phonon scattering. Moreover, the relatively weak interatomic interactions in PbSe/PbTe lead to the formation of antibonding states, resulting in strong anharmonicity and ultimately yielding ultralow lattice thermal conductivity