Groundwater contaminant source identification considering unknown boundary condition based on an automated machine learning surrogate

Yaning Xu, Wenxi Lu, Zidong Pan, Chengming Luo, Yukun Bai, Shuwei Qiu

Geoscience Frontiers ›› 2024, Vol. 15 ›› Issue (1) : 101732.

PDF(4592 KB)
Geoscience Frontiers All Journals
PDF(4592 KB)
Geoscience Frontiers ›› 2024, Vol. 15 ›› Issue (1) : 101732. DOI: 10.1016/j.gsf.2023.101732
Research Paper

Groundwater contaminant source identification considering unknown boundary condition based on an automated machine learning surrogate

Author information +
History +

Abstract

Groundwater contamination source identification (GCSI) is a prerequisite for contamination risk evaluation and efficient groundwater contamination remediation programs. The boundary condition generally is set as known variables in previous GCSI studies. However, in many practical cases, the boundary condition is complicated and cannot be estimated accurately in advance. Setting the boundary condition as known variables may seriously deviate from the actual situation and lead to distorted identification results. And the results of GCSI are affected by multiple factors, including contaminant source information, model parameters, boundary condition, etc. Therefore, if the boundary condition is not estimated accurately, other factors will also be estimated inaccurately. This study focuses on the unknown boundary condition and proposed to identify three types of unknown variables (contaminant source information, model parameters and boundary condition) innovatively. When simulation-optimization (S-O) method is applied to GCSI, the huge computational load is usually reduced by building surrogate models. However, when building surrogate models, the researchers need to select the models and optimize the hyperparameters to make the model powerful, which can be a lengthy process. The automated machine learning (AutoML) method was used to build surrogate model, which automates the model selection and hyperparameter optimization in machine learning engineering, largely reducing human operations and saving time. The accuracy of AutoML surrogate model is compared with the surrogate model used in eXtreme Gradient Boosting method (XGBoost), random forest method (RF), extra trees regressor method (ETR) and elasticnet method (EN) respectively, which are automatically selected in AutoML engineering. The results show that the surrogate model constructed by AutoML method has the best accuracy compared with the other four methods. This study provides reliable and strong support for GCSI.

Keywords

Groundwater contamination source / Boundary condition / Automated machine learning / Surrogate model

Cite this article

Download citation ▾
Yaning Xu, Wenxi Lu, Zidong Pan, Chengming Luo, Yukun Bai, Shuwei Qiu. Groundwater contaminant source identification considering unknown boundary condition based on an automated machine learning surrogate. Geoscience Frontiers, 2024, 15(1): 101732 https://doi.org/10.1016/j.gsf.2023.101732

References

[1]
An Y., Zhang Y., Yan X., 2022. An integrated Bayesian and machine learning approach application to identification of groundwater contamination source parameters. Water 14(15), 2447.
[2]
Bhosekar A., Ierapetritou M., 2018. Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Comput. Chem. Eng. 108, 250-267.
[3]
Breiman L., 2001. Random forests. Mach. Learn. 45(1), 5-32.
[4]
Ceh M., Kilibarda M., Lisec A., Bajat B., 2018. Estimating the performance of random forest versus multiple regression for predicting prices of the apartments. ISPRS Int. j.Geo-Inf. 7(5), 168.
[5]
Chen T., Guestrin C., 2016. XGBoost: A Scalable Tree Boosting System,in:Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco California, CA, 785-794.
[6]
Cheng B., Yao Y., 2023. Machine learning based surrogate model to analyze wind tunnel experiment data of Darrieus wind turbines. Energy 278(Part A), 127940.
[7]
Devi M.S., Mathew R.M., Suguna R., 2019. Regressor fitting of feature importance for customer segment prediction with ensembling schemes using machine learning. Int. j.Eng. Adv. Technol. 8(6), 952-956.
[8]
Fortin F.A., Rainville F.M.D., Gardner M.A., Parizeau M., Gagne C., 2012. Deap: evolutionary algorithms made easy. j.Mach. Learn. Res. 13(7), 2171-2175.
[9]
Ghane A., Mazaheri M., Samani J.M.V., 2016. Application of backward probability method in pollutant source tracking in non-uniform flow rivers. j.Environ. Stud. 42(2), 397-410,25-27.
[10]
Goldber D.E., Holland J.H., 1988. Genetic algorithms and machine learning. Mach. Learn. 3(2-3), 95-99.
[11]
Gorelick S.M., Evans B., Remson I., 1983. Identifying sources of groundwater pollution: An optimization approach. Water Resour. Res. 19(3), 779-790.
[12]
Hariri-Ardebili M.A., Mahdavi G., 2023. Generalized uncertainty in surrogate models for concrete strength prediction. Eng. Appl. Artif. Intell. 122, 106155.
[13]
Hazrati-Yadkoori S., Datta B., 2017. Characterization of Groundwater Contaminant Sources by Utilizing MARS Based Surrogate Model Linked to Optimization Model,in:Proc. Advances in Computer Communication and Computational Sciences, Bangkok, 153-162.
[14]
Lapworth D.J., Boving T.B., Kreamer D.K., Kebede S., Smedley P.L., 2022. Groundwater quality: Global threats, opportunities and realising the potential of groundwater. Sci. Total Environ. 811, 152471.
[15]
Li S., Farrar C., Yang Y., 2023. Efficient regional seismic risk assessment via deep generative learning of surrogate models. Earthquake Eng. Struct. Dyn. 52(11), 3435-3454.
[16]
Li J., Lu W., Luo J., 2021. Groundwater contamination sources identification based on the Long-Short Term Memory network. j.Hydrol. 601(1), 126670.
[17]
Liu C., Liu A., Wang R., Zhao H., Lu Z., 2022. Path planning algorithm for multi-locomotion robot based on multi-objective genetic algorithm with elitist strategy. Micromachines 13(4), 616.
[18]
Luo C., Lu W., Pan Z., Bai Y., Dong G., 2023. Simultaneous identification of groundwater pollution source and important hydrogeological parameters considering the noise uncertainty of observational data. Environ. Sci. Pollut. Res. 30, 84267-84282. https://doi.org/10.1007/s11356-023-28091-x.
[19]
Mahar P. S., Datta B., 2001. Optimal identification of ground-water pollution sources and parameter estimation. j.Water Resour. Plann. Manage. 127(1), 20-29.
[20]
Mahdi G.J.M., Mohammed N.J., Al-Sharea Z.I., 2021. Regression shrinkage and selection variables via an adaptive elastic net model. j.Phys.: Conf. Ser. 1879(3), 032014.
[21]
Mandana B., Saeid E., Gholamreza S., Alborz H., 2020. Groundwater level prediction through GMS software-case study of Karvan area, Iran. Quaestiones Geographicae 39(3), 139-145.
[22]
Neupauer R. M. and Wilson j.L., 1999. Adjoint method for obtaining backward-in-time location and travel time probabilities of a conservative groundwater contaminant. Water Resour. Res. 35(11), 3389-3398.
[23]
Olson R.S., Bartley N., Urbanowicz R.J., Moore J.H., 2016a. Evaluation of a tree-based pipeline optimization tool for automating data science, In: Proc. 2016 Genetic and Evolutionary Computation Conference, Denver, Colorado, USA, 485-492.
[24]
Olson R.S., Urbanowicz R.J., Andrews P.C., Lavender N.A., Kidd L.C., Moore J.H., 2016b. Automating Biomedical Data Science Through Tree-Based Pipeline Optimization. In: SquilleroG., BurelliP. (Eds.), Applications of Evolutionary Computation. EvoApplications 2016. Lecture Notes in Computer Science; Springer International Publishing, Cham, pp 123-137.
[25]
Olson R.S., Moore J.H., 2019. TPOT:A Tree-Based Pipeline Optimization Tool for Automating Machine Learning. In: HutterF., KotthoffL., VanschorenJ. (Eds.), Automated Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham, pp. 151-160.
[26]
Pan Z., Lu W., Bai Y., 2022a. Groundwater contamination source estimation based on a refined particle filter associated with a deep residual neural network surrogate. Hydrogeol. J. 30(3), 881-897.
[27]
Pan Z., Lu W., Wang H., Bai Y., 2022b. Groundwater contaminant source identification based on an ensemble learning search framework associated with an auto xgboost surrogate. Environ. Model. Softw. 159, 105588.
[28]
Pinder G.F., Bredehoeft J.D., 1968. Application of the digital computer for aquifer evaluation. Water Resour. Res. 4(5), 1069-1093.
[29]
Regenwetter L., Weaver C., Ahmed F., 2022. FRAMED: An autoML approach for structural performance prediction of bicycle frames. Comput.-Aided Des. 156, 103446.
[30]
Santu S. K. K., Hassan M. M., Smith M. J., Xu L., Zhai C.X., Veeramachaneni K., 2021. AutoML to date and beyond: Challenges and opportunities. ACM Comput. Surv. 54(8),1-36.
[31]
Shakeri R., Nassery H.R., Ebadi T., 2023. Numerical modeling of groundwater flow and nitrate transport using MODFLOW and MT3DMS in the Karaj alluvial aquifer, Iran. Environ Monit Assess 195, 242.
[32]
Sidauruk P., Cheng A.H.-D., Ouazar D., 1998. Ground water contaminant source and transport parameter identification by correlation coefficient optimization. Groundwater 36(2), 208-214.
[33]
Singh R.M., Datta B., 2006. Identification of groundwater pollution sources using GA-based linked simulation optimization model. j.Hydrologic. Eng. 11(2), 101-109.
[34]
Skaggs T. H., Kabala Z. J., 1994. Recovering the release history of a groundwater contaminant. Water Resour. Res. 30(1), 71-79.
[35]
Sohn A., Olson R.S., Moore J.H., 2017. Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming, in:Proc. GECCO '17 Genetic and Evolutionary Computation Conference, Berlin, Germany, 489-496.
[36]
Srivastava D., Singh R.M., 2015. Groundwater system modeling for simultaneous identification of pollution sources and parameters with uncertainty characterization. Water Resour. Manage. 29(13), 4607-4627.
[37]
Wang Z., Bi D., Xiong L., Fan Z., Zhang X., 2017. Text image restoration algorithm based on sparse coding and ridge regression. j.Comput. Appl. 37(9), 2648-2651.
[38]
Wang N., Chang H., Zhang D., 2023. Inverse modeling for subsurface flow based on deep learning surrogates and active learning strategies. Water Resour. Res. e2022WR033644. https://doi.org/10.1029/2022WR033644.
[39]
Wang Z., Lu W., Chang Z., Wang H., 2022a. Simultaneous identification of groundwater contaminant source and simulation model parameters based on an ensemble Kalman filter - Adaptive step length ant colony optimization algorithm. j.Hydrol. 605, 127352.
[40]
Wang X., Wang X., Ma B., Li Q., Shi Y.-Q., 2021. High precision error prediction algorithm based on ridge regression predictor for reversible data hiding. IEEE Signal Processing Letters 28, 1125-1129.
[41]
Wang Z., Yin H., Peng Z., 2022b. Bayesian model updating based on Kriging surrogate model and simulated annealing algorithm. j.Phys.: Conf. Ser. 2148(1), 012008.
[42]
Xing Z., Qu R., Zhao Y., Fu Q., Ji Y., Lu W., 2019. Identifying the release history of a groundwater contaminant source based on an ensemble surrogate model. j.Hydrol. 572, 501-516.
[43]
Xu S., Zhou Z., Tao M., 2021. Accuracy improvement of age prediction model based on autoML. j.Phys.: Conf. Ser. 1748(3): 032005.
[44]
Yoon S., Lee S., Zhang J.J., Zeng L.Z., Kang P.K., 2023. Inverse estimation of multiple contaminant sources in three-dimensional heterogeneous aquifers with variable-density flows. j.Hydrol. 617, 129041.
[45]
Zhang Q., Wang H., Xu Z., Li G., Yang M., Liu J., 2023. Quantitative identification of groundwater contamination sources by combining isotope tracer technique with PMF model in an arid area of northwestern China. j.Environ. Manage. 325(B), 116588.
[46]
Zhao Y., Li Y., Fan D., Song J., Yang F., 2021. Application of kernel extreme learning machine and kriging model in prediction of heavy metals removal by biochar. Bioresour. Technol 329, 124876.
PDF(4592 KB)

306

Accesses

0

Citations

Detail

Sections
Recommended

/