DSMR: an AI framework for exploring combinations of data and algorithm to overcome efficiency-accuracy trade-off

Jianhua Chen , Junwei Chen , Boyu Zhao , Yunying Fan , Zhigang Yu , Jun Luan , Kuochih Chou

Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (3) : 40

PDF
Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (3) :40 DOI: 10.20517/jmi.2025.20
Research Article

DSMR: an AI framework for exploring combinations of data and algorithm to overcome efficiency-accuracy trade-off

Author information +
History +
PDF

Abstract

Machine learning models demonstrate remarkable capabilities in predicting properties of novel material. The optimal model can theoretically be obtained through an exhaustive search of data subsets, algorithms, and hyperparameters. However, the fundamental challenge lies in identifying the most efficient pathway through this immense search space. In this paper, we address this challenge by proposing an active learning-based data screening and model retrieval framework, which can develop enhanced models based on internal data while incorporating additional external data to further improve model performance. Systematic validation studies were conducted using four datasets, comprising both classification and regression data. Superior models were obtained within 10 iterative cycles for all cases, achieving a 3.3%-10.3% improvement compared to state-of-the-art results in current literature. Among the results, the framework reduced modeling error by 10.3% for AlCoCrCuFeNi hardness internal data and achieved a more significant error reduction of 42.6% through the integration of additional external hardness data. The framework achieves an ideal balance between computational efficiency and predictive accuracy while enabling deeper data exploration, with its low-code implementation and user-friendly characteristics making it a promising tool for materials design.

Keywords

Active learning / data enhancement / model optimization

Cite this article

Download citation ▾
Jianhua Chen, Junwei Chen, Boyu Zhao, Yunying Fan, Zhigang Yu, Jun Luan, Kuochih Chou. DSMR: an AI framework for exploring combinations of data and algorithm to overcome efficiency-accuracy trade-off. Journal of Materials Informatics, 2025, 5(3): 40 DOI:10.20517/jmi.2025.20

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Rao Z,Xie R.Machine learning-enabled high-entropy alloy discovery.Science2022;378:78-85

[2]

Chen J,Luan J.Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation.J Mater Inf2025;5:22

[3]

Yuan Y,Li P,Zhou H.Multi-model integration accelerates Al-Zn-Mg-Cu alloy screening.J Mater Inf2024;4:23

[4]

Chen C,Ye W,Deng Z.A critical review of machine learning of energy materials.Adv Energy Mater2020;10:1903242

[5]

Hu M,Knibbe R.Recent applications of machine learning in alloy design: a review.Mater Sci Eng R Rep2023;155:100746

[6]

Wen C,Wang C.Machine learning assisted design of high entropy alloys with desired property.Acta Mater2019;170:109-17

[7]

Li S,Liu D,Yang Z.Hardness prediction of high entropy alloys with machine learning and material descriptors selection by improved genetic algorithm.Comput Mater Sci2022;205:111185

[8]

Zhang Y,Wang W.Interpretable hardness prediction of high-entropy alloys through ensemble learning.J Alloys Compd2023;945:169329

[9]

Altmann A,Sander O.Permutation importance: a corrected feature importance measure.Bioinformatics2010;26:1340-7

[10]

Darst BF,Engelman CD.Using recursive feature elimination in random forest to account for correlated variables in high dimensional data.BMC Genet2018;19:65 PMCID:PMC6157185

[11]

Abdi H.Principal component analysis.WIREs Comput Stat2010;2:433-59

[12]

Shlens J. A tutorial on principal component analysis. arXiv 2014, arXiv:1404.1100. https://doi.org/10.48550/arXiv.1404.1100. (accessed 19 Jun 2025)

[13]

Zhang H,He X.Dramatically enhanced combination of ultimate tensile strength and electric conductivity of alloys via machine learning screening.Acta Mater2020;200:803-10

[14]

Jiang L,Zhang H.Physical mechanism interpretation of polycrystalline metals’ yield strength via a data-driven method: a novel Hall–Petch relationship.Acta Mater2022;231:117868

[15]

Gupta S.Dealing with noise problem in machine learning data-sets: a systematic review.Procedia Comput Sci2019;161:466-74

[16]

Mohammed R,Abdullah M.Machine learning with oversampling and undersampling techniques: overview study and experimental results. In 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan. Apr 07-09, 2020. IEEE; 2020. pp. 243-8.

[17]

Li K,Choudhary K,Greenwood M.Exploiting redundancy in large materials datasets for efficient machine learning with less data.Nat Commun2023;14:7283 PMCID:PMC10638383

[18]

Chen S,Ouyang Q,Qian Q.ALDS: an active learning method for multi-source materials data screening and materials design.Mater Design2022;223:111092

[19]

Frazier PI.Bayesian optimization. In: Gel E, Ntaimo L, Shier D, Greenberg HJ, editors. Recent advances in optimization and modeling of contemporary problems. INFORMS; 2018. pp. 255-78.

[20]

Shahriari B,Wang Z,de Freitas N.Taking the human out of the loop: a review of Bayesian optimization.Proc IEEE2016;104:148-75

[21]

Pedregosa F,Gramfort A. Scikit-learn: machine learning in Python. J. Mach. Learn Res. 2011, 12, 2825-30. https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf?source=post_page. (accessed 19 Jun 2025)

[22]

Chen T. XGBoost: a scalable tree boosting system. arXiv 2016, arXiv:1603.02754. https://doi.org/10.48550/arXiv.1603.02754. (accessed 19 Jun 2025)

[23]

Prokhorenkova L,Vorobev A,Gulin A. CatBoost: unbiased boosting with categorical features. arXiv 2017, arXiv:1706.09506. https://doi.org/10.48550/arXiv.1706.09516. (accessed 19 Jun 2025)

[24]

Jo JM.Effectiveness of normalization pre-processing of big data to the machine learning performance.J Korea Inst Electron Commun Sci2019;14:547-52

[25]

Wong T.Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation.Pattern Recognit2015;48:2839-46

[26]

Fushiki T.Estimation of prediction error by using K-fold cross-validation.Stat Comput2011;21:137-46

[27]

Ye Y,Ouyang R,Tang Y.Improving machine learning based phase and hardness prediction of high-entropy alloys by using Gaussian noise augmented data.Comput Mater Sci2023;223:112140

[28]

Ma J,Dong S.MLMD: a programming-free AI platform to predict and design materials.npj Comput Mater2024;10:1243

[29]

Kano S,Suzue R.Precipitation of carbides in F82H steels and its impact on mechanical strength.Nucl Mater Energy2016;9:331-7

[30]

Williams CA,Smith GD.Effects of heavy-ion irradiation on solute segregation to dislocations in oxide-dispersion-strengthened Eurofer 97 steel.J Nucl Mater2011;412:100-5

[31]

Haertling GH.Ferroelectric ceramics: history and technology.J Am Ceram Soc1999;82:797-818

[32]

Green MA,Snaith HJ.The emergence of perovskite solar cells.Nature Photon2014;8:506-14

[33]

Correa-Baena JP,Buonassisi T.Promises and challenges of perovskite solar cells.Science2017;358:739-44

[34]

Chen X,Cheng Z.Direct observation of chemical short-range order in a medium-entropy alloy.Nature2021;592:712-6

[35]

Zhang R,Ding J.Short-range order and its impact on the CrCoNi medium-entropy alloy.Nature2020;581:283-7

[36]

Shi P,Li Y.Hierarchical crack buffering triples ductility in eutectic herringbone high-entropy alloys.Science2021;373:912-8

[37]

Senkov O,Woodward C.Effect of aluminum on the microstructure and properties of two refractory high-entropy alloys.Acta Mater2014;68:214-28

[38]

Guo S.Phase selection rules for cast high entropy alloys: an overview.Mater Sci Technol2015;31:1223-30

[39]

Zhang Y,Wang C.Phase prediction in high entropy alloys with a rational selection of materials descriptors and machine learning models.Acta Mater2020;185:528-39

[40]

Chang H,Liaw PK.Phase prediction and effect of intrinsic residual strain on phase stability in high-entropy alloys with machine learning.J Alloys Compd2022;921:166149

AI Summary AI Mindmap
PDF

57

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/