Virtual sample generation in machine learning assisted materials design and discovery

Pengcheng Xu , Xiaobo Ji , Minjie Li , Wencong Lu

Journal of Materials Informatics ›› 2023, Vol. 3 ›› Issue (3) : 16

PDF
Journal of Materials Informatics ›› 2023, Vol. 3 ›› Issue (3) :16 DOI: 10.20517/jmi.2023.18
Review

Virtual sample generation in machine learning assisted materials design and discovery

Author information +
History +
PDF

Abstract

Virtual sample generation (VSG), as a cutting-edge technique, has been successfully applied in machine learning-assisted materials design and discovery. A virtual sample without experimental validation is defined as an unknown sample, which is either expanded from the original data distribution for modeling or designed via algorithms for predicting. This review aims to discuss the applications of VSG techniques in machine learning-assisted materials design and discovery based on the research progress in recent years. First, we summarize the commonly used VSG algorithms in materials design and discovery for data expansion of the training set, including Bootstrap, Monte Carlo, particle swarm optimization, mega trend diffusion, Gaussian mixture model, random forest, and generative adversarial networks. Next, frequently employed searching algorithms for materials discovery are introduced, including particle swarm optimization, efficient global optimization, and proactive searching progress. Then, universally adopted inverse design methods are presented, including genetic algorithm, Bayesian optimization, and pattern recognition inverse projection. Finally, the future directions of VSG in the design and discovery of materials are proposed.

Keywords

Materials machine learning / virtual sample generation / searching algorithms / inverse design

Cite this article

Download citation ▾
Pengcheng Xu, Xiaobo Ji, Minjie Li, Wencong Lu. Virtual sample generation in machine learning assisted materials design and discovery. Journal of Materials Informatics, 2023, 3(3): 16 DOI:10.20517/jmi.2023.18

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Benton WC.Machine learning systems and intelligent applications.IEEE Softw2020;37:43-9

[2]

Zhang X.Research and application of machine learning in automatic program generation.Chin j electron2020;29:1001-15

[3]

Zhang X,Suen CY.Towards robust pattern recognition: a review.Proc IEEE2020;108:894-922

[4]

Rani S,Kumar S.Three dimensional objects recognition & pattern recognition technique; related challenges: a review.Multimed Tools Appl2022;81:17303-46

[5]

Cipriano LE.Evaluating the impact and potential impact of machine learning on medical decision making.Med Decis Making2023;43:147-9 PMCID:PMC9827491

[6]

Zhong X,Liu S,Hiszpanski A.Explainable machine learning in materials science.npj Comput Mater2022;8

[7]

Cai J,Xu K,Wei J.Machine learning-driven new material discovery.Nanoscale Adv2020;2:3115-30 PMCID:PMC9419423

[8]

Wei J,Sun X.Machine learning in materials science.InfoMat2019;1:338-58

[9]

Fu Z,Huang C.A review of performance prediction based on machine learning in materials science.Nanomaterials2022;12:2957 PMCID:PMC9457802

[10]

Mohtasham Moein M,Rahmati K.Predictive models for concrete properties using machine learning and deep learning approaches: a review.J Build Eng2023;63:105444

[11]

Vivanco-benavides LE,Mercado-zúñiga C.Machine learning and materials informatics approaches in the analysis of physical properties of carbon nanotubes: a review.Comput Mater Sci2022;201:110939

[12]

Park S,Kim H.Machine learning applications for chemical reactions.Chem Asian J2022;17:e202200203 PMCID:PMC9401034

[13]

Bartel CJ,Wang Q,Jain A.A critical examination of compound stability predictions from machine-learned formation energies.npj Comput Mater2020;6:97

[14]

Yu Z,Szlufarska I.Structural signatures for thermodynamic stability in vitreous silica: Insight from machine learning and molecular dynamics simulations.Phys Rev Materials2021;5:015602

[15]

Yang Z.Applications of machine learning in alloy catalysts: rational selection and future development of descriptors.Adv Sci2022;9:e2106043 PMCID:PMC9036033

[16]

Tao Q,Li M.Machine learning for perovskite materials design and discovery.npj Comput Mater2021;7:23

[17]

Timkina YA,Litvin AP,Rogach AL.Ytterbium-doped lead-halide perovskite nanocrystals: synthesis, near-infrared emission, and open-source machine learning model for prediction of optical properties.Nanomaterials2023;13:744 PMCID:PMC9958719

[18]

Xu P,Li M.New opportunity: machine learning for polymer materials design and discovery.Advcd Theory and Sims2022;5:2100565

[19]

Martin TB.Emerging trends in machine learning: a polymer perspective.ACS Polym Au2023;3:239-58 PMCID:PMC10273415

[20]

Xu P,Li M.Small data machine learning in materials science.npj Comput Mater2023;9:42

[21]

Swain MC.ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature.J Chem Inf Model2016;56:1894-904

[22]

Li Z,Xiong B.Materials science database in material research and development: recent applications and prospects.Frontiers Data Comput2020;2:78-90

[23]

Stein HS,Rahmanian F.From materials discovery to system optimization by integrating combinatorial electrochemistry and data science.Curr Opin Electrochem2022;35:101053

[24]

Schleder GR,Acosta CM,Fazzio A.From DFT to machine learning: recent approaches to materials science - a review.J Phys Mater2019;2:032001

[25]

Lin GSS,Tan HJ,Afrashtehfar KI.Innovative pedagogical strategies in health professions education: active learning in dental materials science.Int J Environ Res Public Health2023;20:2041 PMCID:PMC9915854

[26]

Henderson AR.The bootstrap: a technique for data-driven statistics. Using computer-intensive analyses to explore experimental data.Clin Chim Acta2005;359:1-26

[27]

Zhu Q,Xu Y.A bootstrap based virtual sample generation method for improving the accuracy of modeling complex chemical processes using small datasets. In: IEEE 6th Data Driven Control and Learning Systems Conference; 2017 May 26-27; Chongqing, China. IEEE; 2017. p. 84-88.

[28]

Han P,Zhan J.A bootstrap-bayesian dynamic modification model based on small sample target features. In: Global Oceans 2020: Singapore - U.S. Gulf Coast;2020 Oct 5-30; Biloxi, MS, USA. IEEE; 2020; p. 1-6.

[29]

Rubin DB.The bayesian bootstrap.Annal Statist1981; 9:130-134.

[30]

Raeside DE.Monte Carlo principles and applications.Phys Med Biol1976;21:181-97

[31]

Gong H,Zhu Q.A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: an empirical study of petrochemical industries.Appl Energy2017;197:405-15

[32]

Valle Y, Venayagamoorthy G, Mohagheghi S, Hernandez J, Harley R. Particle swarm optimization: basic concepts, variants and applications in power systems.IEEE Trans Evol Computat2008;12:171-95

[33]

Chen Z,He Y.A PSO based virtual sample generation method for small sample sets: applications to regression datasets.Eng Appl Artif Intell2017;59:236-43

[34]

Yu L.Can small sample dataset be used for efficient internet loan credit risk assessment? Evidence from online peer to peer lending.Fin Res Lett2021;38:101521

[35]

Wu S,Zhao J,Zhong K.Virtual sample generation and ensemble learning based image source identification with small training samples.Int J Digit Crime Forensics2021;13:34-46

[36]

Li D,Tsai T.Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge.Comput Oper Res2007;34:966-82

[37]

Guo Z,Qiao J.An improved virtual sample generation technology based on mega trend diffusion. In: 2019 Chinese Automation Congress (CAC); 2019 Nov 22-24; Hangzhou, China. IEEE; 2020. p. 22-24.

[38]

Zhu B,Geng Z.Cost estimation method based on parallel Monte Carlo simulation and market investigation for engineering construction project.Cluster Comput2016;19:1293-308

[39]

Yu X,Xu Y.A Mega-Trend-Diffusion and Monte Carlo based virtual sample generation method for small sample size problem.J Phys Conf Ser2019;1325:012079

[40]

Shen L.A virtual sample generation algorithm supporting machine learning with a small-sample dataset: a case study for rubber materials.Comput Mater Sci2022;211:111475

[41]

Reynolds DA,Dunn RB.Speaker verification using adapted gaussian mixture models.Digit Signal Process2000;10:19-41

[42]

Xu P,Chen S,Qian Q.Machine learning-assisted design of yttria-stabilized zirconia thermal barrier coatings with high bonding strength.ACS Omega2022;7:21052-61 PMCID:PMC9219529

[43]

Talekar B.A Detailed review on decision tree and random forest.Biosci Biotech Res Comm2020;13:245-8

[44]

Hu J.A review on longitudinal data analysis with random forest.Brief Bioinform2023;24:bbad002 PMCID:PMC10025446

[45]

He YL,Zhu QX.Enhanced virtual sample generation based on manifold features: applications to developing soft sensor using small data.ISA Trans2022;126:398-406

[46]

Gui J,Wen Y,Ye J.A review on generative adversarial networks: algorithms, theory, and applications.IEEE Trans Knowl Data Eng2023;35:3313-32

[47]

Cheng J,Tang X.Generative adversarial networks: a literature review.KSII T Internet Info2020;14:4625-4647

[48]

Cui C,Xia H,Yu W.Virtual sample generation method based on generative adversarial fuzzy neural network.Neural Comput Appl2023;35:6979-7001

[49]

He Y,Ma J,Zhu Q.A novel virtual sample generation method based on a modified conditional Wasserstein GAN to address the small sample size problem in soft sensing.J Process Control2022;113:18-28

[50]

Zhu Q,Chen Z,Xu Y.Novel virtual sample generation using conditional GAN for developing soft sensor with small data.Eng Appl Artif Intell2021;106:104497

[51]

Aggarwal S.Cuckoo, Bat and Krill Herd based k-means++ clustering algorithms.Cluster Comput2019;22:14169-80

[52]

Liu Z,Chen Z.Swarm intelligence for new materials.Comput Mater Sci2022;214:111699

[53]

Yan S,Gao Z,Ren J.Directional design of materials based on multi-objective optimization: a case study of two-dimensional thermoelectric SnSe.Chinese Phys Lett2021;38:027301

[54]

Shim S,Han J.Optimal composition of Li argyrodite with harmonious conductivity and chemical/electrochemical stability: fine-tuned via tandem particle swarm optimization.Adv Sci2022;9:e2201648 PMCID:PMC9534954

[55]

Zheng R. Optimized design of absorbing structural materials using a particle swarm optimization algorithm. Mod Def Technol 2019;47:88-93. (in Chinese) Available from: https://xueshu.baidu.com/usercenter/paper/show?paperid=1k2b08n0gs0d0ep0b03q0040ef231476&site=xueshu_se. [Last accessed on 6 Jul 2023]

[56]

Chen L,Lin W,Zhong X.Hybrid-surrogate-model-based efficient global optimization for high-dimensional antenna design.PIER2012;124:85-100

[57]

Xu B.A multiple-data-based efficient global optimization algorithm and its parallel implementation for automotive body design.Adv Mech Eng2018;10:168781401879434

[58]

Bhosekar A.Advances in surrogate based modeling, feasibility analysis, and optimization: a review.Comput Chem Eng2018;108:250-67

[59]

Raponi E,Boria S,Belingardi G.Methodology for parameter identification on a thermoplastic composite crash absorber by the sequential response surface method and efficient global optimization.Compos Struct2021;278:114646

[60]

Zhao W,Xiao B.Composition refinement of 6061 aluminum alloy using active machine learning model based on Bayesian optimization sampling.Acta Metall Sin2021;57:797-810

[61]

Xue D,Hogden J,Xue D.Accelerated search for materials with targeted properties by adaptive design.Nat Commun2016;7:11241 PMCID:PMC4835535

[62]

Zhang Q.Sequential model-based optimization for continuous inputs with finite decision space.Technometrics2020;62:486-98

[63]

Li B,Hu K.A method for parameter identification of distribution network equipment based on sequential model-based optimization.Int Trans Electr2022;2022:1-12

[64]

Lu T,Li M,Lu W.Inverse design of hybrid organic-inorganic perovskites with suitable bandgaps via proactive searching progress.ACS Omega2022;7:21583-94 PMCID:PMC9245129

[65]

Katoch S,Kumar V.A review on genetic algorithm: past, present, and future.Multimed Tools Appl2021;80:8091-126 PMCID:PMC7599983

[66]

Leardi R.Genetic algorithms in chemistry.J Chromatogr A2007;1158:226-33

[67]

Lim Y,Lee S.Finely tuned inverse design of metal–organic frameworks with user-desired Xe/Kr selectivity.J Mater Chem A2021;9:21175-83

[68]

Dong R,Li X.Inverse design of composite metal oxide optical materials based on deep transfer learning and global optimization.Comput Mater Sci2021;188:110166

[69]

Toropov AA,Leszczynska D.Multiplicative SMILES-based optimal descriptors: QSPR modeling of fullerene C60 solubility in organic solvents.Chem Phys Lett2008;457:332-6

[70]

Wang Y,Wang J,Fang D.Inverse design of shell-based mechanical metamaterial with customized loading curves based on machine learning and genetic algorithm.Comput Methods Appl Mech Eng2022;401:115571

[71]

Maurizi M,Berto F.Inverse design of truss lattice materials with superior buckling resistance.npj Comput Mater2022;8:247

[72]

Nigam A,Aspuru-Guzik A.Parallel tempered genetic algorithm guided by deep neural networks for inverse molecular design.Digit Discov2022;1:390-404 PMCID:PMC9358752

[73]

Greenhill S,Gupta S,Venkatesh S.Bayesian optimization for adaptive experimental design: a review.IEEE Access2020;8:13937-48

[74]

Jiang M. Survey on Bayesian optimization algorithm. Comput Eng Des 2010;31:3254-3259. Available from: https://xueshu.baidu.com/usercenter/paper/show?paperid=ce7eea962163345bf08f16cdc1a3db8b&site=xueshu_se. [Last accessed on 6 Jul 2023]

[75]

Wu S,Liu C,Yoshida R.iQSPR in XenonPy: a Bayesian molecular design algorithm.Mol Inform2020;39:e1900107 PMCID:PMC7050509

[76]

Wu S,Kakimoto M.Machine-learning-assisted discovery of polymers with high thermal conductivity using a molecular design algorithm.npj Comput Mater2019;5:66

[77]

Serrão R, Oliveira MR, Oliveira L. Theoretical derivation of interval principal component analysis.Inf Sci2023;621:227-47

[78]

Hou S.Is uncorrelated linear discriminant analysis really a new method?.Chemom Intell Lab Syst2015;142:49-53

[79]

Yang C,Jia Y,Li M.A machine learning-based alloy design system to facilitate the rational design of high entropy alloys with enhanced hardness.Acta Mater2022;222:117431

[80]

Wang X,Lu T.Inverse design of ternary gold alloy materials with low resistivity.Mater Chin2021;40:251-256(in Chinese) Available from: https://d.wanfangdata.com.cn/periodical/zgcljz202104002. [Last accessed on 6 Jul 2023]

[81]

Liu Y,Yang Z.Machine learning embedded with materials domain knowledge.J Chin Ceram Soc2022;50:863-76Available from: ​http://www.jccsoc.com/Magazine/Show.aspx?ID=51304.[Lastaccessed on 27 Jul 2023]

AI Summary AI Mindmap
PDF

99

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/