Groundwater Potential mapping under data constraints: the critical roles of sample size and feature dimensionality

Zitao WANG , Jiansen LI , Yang LIU , Jinjun HAN

Front. Earth Sci. ››

PDF (18367KB)
Front. Earth Sci. ›› DOI: 10.1007/s11707-026-0213-6
RESEARCH ARTICLE
Groundwater Potential mapping under data constraints: the critical roles of sample size and feature dimensionality
Author information +
History +
PDF (18367KB)

Abstract

Reliable groundwater potential (GWP) mapping is essential for water resource planning, yet most studies emphasize algorithmic comparisons while neglecting how data constraints—specifically, limited sample sizes and high feature dimensionality—govern model reliability. This study examined these effects using Qinghai Province, China as a case study. A data set of 682721 grid cells and 20 environmental factors was established. To assess whether sample-size sensitivity generalizes across algorithms, a benchmark test was conducted with five machine learning methods: Random Forest (RF), Support Vector Machine, Multilayer Perceptron, Logistic Regression, and Gradient Boosting Decision Tree. Results show that sample size is the dominant factor across all tested architectures; models trained on fewer than 100 samples were universally unstable, yielding accuracies below 0.60. With RF as the representative model, performance rose sharply and converged near an accuracy of 0.82 and a macro F1 score of 0.78 once the training set reached 1000–2000 samples. Further analysis revealed that feature dimensionality played a secondary but notable role: retaining 8–10 key variables related to surface water connectivity, topography, and human activity improved accuracy and spatial coherence, whereas redundant features produced spatially fragmented predictions. Model interpretability was also data-dependent; SHAP values were unstable under small sample sizes but stabilized with sufficient data. Overall, this study demonstrates that sampling sufficiency and parsimonious feature selection are critical determinants of robust and interpretable GWP mapping, offering methodological guidance on data requirements that extends beyond algorithm choice.

Graphical abstract

Keywords

Groundwater potential mapping / Sample size sensitivity / Feature selection / Model interpretability / Monte Carlo uncertainty analysis

Cite this article

Download citation ▾
Zitao WANG, Jiansen LI, Yang LIU, Jinjun HAN. Groundwater Potential mapping under data constraints: the critical roles of sample size and feature dimensionality. Front. Earth Sci. DOI:10.1007/s11707-026-0213-6

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Adesola G O, Gwavava O, Pharoe B K, Baiyegunhi C, Thamaga K H, Muavhi N (2025). Appraising the accuracy of GIS-based bivariate statistical model for groundwater potential mapping in South Africa.Heliyon, 11(10): e43411

[2]

Asrade T M (2024). Groundwater potential mapping and its sustainable management using AHP and FR models in the Jedeb watershed, Upper Blue Nile Basin, Ethiopia.Water Sci Technol Water Supply, 24(10): 3617–3638

[3]

Baalousha H M (2025). Machine learning approaches for groundwater vulnerability assessment in arid environments: enhancing DRASTIC with ANN and random forest.Groundw Sustain Dev, 30: 101496

[4]

Bai Z, Liu Q, Liu Y (2022). Groundwater potential mapping in Hubei region of China using machine learning, ensemble learning, deep learning and AutoML methods.Nat Resour Res, 31(5): 2549–2569

[5]

Bojer A K, Abshare M W, Nadarajah S (2025). Evaluating the effect of climate change and fast population growth on water supply and demand in Jimma town, Ethiopia, using the WEAP modeling tool.J Water Clim Chang, 16(4): 1586–1617

[6]

Breiman L (2001). Random forests.Mach Learn, 45(1): 5–32

[7]

Brunton S L, Kutz J N (2022). Data-driven science and engineering: machine learning, dynamical systems, and control. Cambridge: Cambridge University Press.

[8]

Cai H, Shi H, Zhou Z, Liu S, Babovic V (2024). Explaining the mechanism of multiscale groundwater drought events: a new perspective from interpretable deep learning model.Water Resources Research, 60(7): e2023WR035139

[9]

Chen M, Zhang S, Liu S, Li M, Zhang T, Wu T, Bu X (2025). Mapping the groundwater potential zones in mountainous areas of southern China using GIS, AHP, and fuzzy AHP.Sci Rep, 15(1): 17159

[10]

Dahal K, Sharma S, Shakya A, Talchabhadel R, Adhikari S, Pokharel A, Sheng Z, Pradhan A M S, Kumar S (2023). Identification of groundwater potential zones in data-scarce mountainous region using explainable machine learning.J Hydrol (Amst), 627: 130417

[11]

Díaz-Alcaide S, Martínez-Santos P (2019). Advances in groundwater potential mapping.Hydrogeol J, 27(7): 2307–2324

[12]

Elsaidy A, Yimer E A, Mogheir Y, Huysmans M, Villani L, van Griensven A (2025). Groundwater drought and anthropogenic amplifiers: a review of assessment and response strategies in arid and semi-arid areas.Sci Total Environ, 978: 179406

[13]

Fick S E, Hijmans R J (2017). WorldClim2: new 1‐km spatial resolution climate surfaces for global land areas.Int J Climatol, 37(12): 4302–4315

[14]

Francis V, Krishnaraj S, Kumar S, Andiyappan R K, Govindan P (2024). GIS-based groundwater potential zonation and assessment of groundwater quality and suitability for drinking and irrigation purposes in the Shanmughanadhi river basin, south India.Kuwait J Sci, 51(3): 100243

[15]

Gao J, Alasgah A A, Ahmad I, Nahas F, Dar M A, Zelenakova M, Sisay M, Zewdu G S (2025). Integration of fuzzy logic and geographic weighted regression modeling for enhanced groundwater potential mapping using remote sensing and GIS.Environ Sci Eur, 37(1): 135

[16]

Gómez-Escalonilla V, Vogt M L, Destro E, Isseini M, Origgi G, Djoret D, Martínez-Santos P, Holecz F (2022). Delineation of groundwater potential zones by means of ensemble tree supervised classification methods in the eastern Lake Chad basin.Geocarto Int, 37(25): 8924–8951

[17]

Guo X, Xiong H, Li H, Gui X, Hu X, Li Y, Cui H, Qiu Y, Zhang F, Ma C (2023). Designing dynamic groundwater management strategies through a composite groundwater vulnerability model: integrating human-related parameters into the DRASTIC model using LightGBM regression and SHAP analysis.Environ Res, 236: 116871

[18]

Harris C R, Millman K J, van der Walt S J, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith N J, Kern R, Picus M, Hoyer S, van Kerkwijk M H, Brett M, Haldane A, del Río J F, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant T E (2020). Array programming with NumPy.Nature, 585(7825): 357–362

[19]

Huggins X, Gleeson T, Villholth KG, Rocha JC, Famiglietti JS (2024). Groundwaterscapes: a global classification and mapping of groundwater’s large-scale socioeconomic, ecological, and Earth system functions.Water Resources Research, 60(10): e2023WR036287

[20]

Iranzad R, Liu X (2025). A review of random forest-based feature selection methods for data science education and applications.Int J Data Sci Anal, 20(2): 197–211

[21]

Kamali Maskooni E, Naghibi S A, Hashemi H, Berndtsson R (2020). Application of advanced machine learning algorithms to assess groundwater potential using remote sensing-derived data.Remote Sens (Basel), 12(17): 2742

[22]

Kamaraj P, Jothimani M, Panda B, Sabarathinam C (2023). Mapping of groundwater potential zones by integrating remote sensing, geophysics, GIS, and AHP in a hard rock terrain.Urban Clim, 51: 101610

[23]

Li P, Wang S, Wang Y, Li M (2025). Spatiotemporal dynamics of desertification and driving factors in the Three-River Headwaters Region from 1988 to 2023.J Sustain For, 44(7): 516–537

[24]

Liu B, Sun Y, Gao L (2024). Enhancing groundwater recharge prediction: a feature selection-based deep forest model with Bayesian optimisation.Hydrol Processes, 38(10): e15309

[25]

Loutfy S, Ismail E, Setto I, Abdelhady A A (2025). Assessing groundwater potential in arid regions using geoelectrical resistivity and remote sensing: a case study from Egypt’s Eastern Desert.Model Earth Syst Environ, 11(5): 336

[26]

Lundberg S M, Lee S I (2017). A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems 30. Long Beach: Curran Associates, Inc.

[27]

Madani A, Niyazi B (2023). Groundwater potential mapping using remote sensing and random forest machine learning model: a case study from lower part of Wadi Yalamlam, western Saudi Arabia.Sustainability (Basel), 15(3): 2772

[28]

McKinney W (2010). Data structures for statistical computing in Python.Scipy, 445(1): 51–56

[29]

Minea I, Boicu D, Chelariu O E, Iosub M, Enea A (2022). Assessment of recharge capacity potential of groundwater using comparative multi-criteria decision analysis approaches.J Geogr Sci, 32(4): 735–756

[30]

Moghaddam D D, Rahmati O, Panahi M, Tiefenbacher J, Darabi H, Haghizadeh A, Haghighi A T, Nalivan O A, Tien Bui D (2020). The effect of sample size on different machine learning models for groundwater potential mapping in mountain bedrock aquifers.Catena, 187: 104421

[31]

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011). Scikit-learn: machine learning in Python.J Mach Learn Res, 12: 2825–2830

[32]

Perez N, Singh V, Ringler C, Xie H, Zhu T, Sutanudjaja E H, Villholth K G (2024). Ending groundwater overdraft without affecting food security.Nat Sustain, 7(8): 1007–1017

[33]

Qiu Y, Zhou A, Xiong H, Zhang D, Su C, Zhou S, Go L, Yang C, Cui H, Fan W, Yu Y, Zhang F, Ma C (2025). Probabilistic mapping of imbalanced data for groundwater contamination using classification algorithms: performance and reliability.Groundw Sustain Dev, 28: 101393

[34]

Rad M, Abtahi A, Berndtsson R, McKnight U S, Aminifar A (2024). Interpretable machine learning for predicting the fate and transport of pentachlorophenol in groundwater.Environ Pollut, 345: 123449

[35]

Rahmati O, Pourghasemi H R, Melesse A M (2016). Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at mehran region, Iran.Catena, 137: 360–372

[36]

Rehman A, Islam F, Tariq A, Ul Islam I, Brian J D, Bibi T, Ahmad W, Ali Waseem L, Karuppannan S, Al-Ahmadi S (2024). Groundwater potential zone mapping using GIS and remote sensing based models for sustainable groundwater management.Geocarto Int, 39(1): 2306275

[37]

Rodriguez M M C, Ferolin T P (2024). Groundwater resource exploration and mapping methods: a review.J Environ Eng Sci, 19(3): 140–156

[38]

Rohde M M, Biswas T, Housman I W, Campbell L S, Klausmeyer K R, Howard J K (2021). A machine learning approach to predict groundwater levels in California reveals ecosystems at risk.Front Earth Sci (Lausanne), 9: 784499

[39]

Sadeghi B, Alesheikh A A, Jafari A, Rezaie F (2025). Performance evaluation of convolutional neural network and vision transformer models for groundwater potential mapping.J Hydrol (Amst), 654: 132840

[40]

Sarkar S K, Rudra R R, Talukdar S, Das P C, Nur M S, Alam E, Islam M K, Islam A R M T (2024). Future groundwater potential mapping using machine learning algorithms and climate change scenarios in Bangladesh.Sci Rep, 14(1): 10328

[41]

Shelar R S, Nandgude S B, Pande C B, Costache R, El-Hiti G A, Tolche A D, Son C T, Yadav K K (2023). Unlocking the hidden potential: groundwater zone mapping using AHP, remote sensing and GIS techniques.Geomatics Nat Hazards Risk, 14(1): 2264458

[42]

Singh S K, Zeddies M, Shankar U, Griffiths G A (2019). Potential groundwater recharge zones within New Zealand.Geoscience Frontiers, 10(3): 1065–1072

[43]

Suryawanshi S L, Singh P K, Kothari M, Singh M, Yadav K K, Gupta T (2025). Assessment of groundwater potential zones for hard rock area of Sabi River basin using an integrated approach of remote sensing, GIS and AHP techniques.Phys Chem Earth Parts ABC, 137: 103820

[44]

Thanh N N, Thunyawatcharakul P, Ngu N H, Chotpantarat S (2022). Global review of groundwater potential models in the last decade: parameters, model techniques, and validation.J Hydrol (Amst), 614: 128501

[45]

Wang Z, Le T, Tian K, Phong TV, Bien TX, Pham BT (2024a). Novel ensemble models based on the split-point sampling and node attribute subsampling classifier for groundwater potential mapping.Earth and Space Science, 11(6): e2023EA003338

[46]

Wang Z, Wang J, Han J (2022). Spatial prediction of groundwater potential and driving factor analysis based on deep learning and geographical detector in an arid endorheic basin.Ecol Indic, 142: 109256

[47]

Wang Z, Wang J, Li M (2023). Spatial predictions of groundwater potential using automated machine learning (AutoML): a comparative study of feature selection and training sample size in Qinghai Province, China.Environ Sci Pollut Res Int, 31(1): 1127–1145

[48]

Wang Z, Yue C, Wang J (2024b). An optimization framework with dimensionality reduction using Markov chain Monte Carlo and genetic algorithms for groundwater potential assessment.Appl Soft Comput, 164: 111991

[49]

Weng B, Xia K, Gong X, Xu P (2025). Groundwater storage change and its response to climate warming in Qinghai-Tibet Plateau.J Hydrol (Amst), 662: 134045

[50]

Wöhling T, Delgadillo A O C, Kraft M, Guthke A (2025). Comparing physics-based, conceptual and machine-learning models to predict groundwater levels by BMA.Groundwater, 63(4): 484–505

[51]

Wu J (2004). Effects of changing scale on landscape pattern analysis: scaling relations.Landsc Ecol, 19(2): 125–138

[52]

Yan D, Zheng X, Feng M, Liang S, Hu Z, Kuang X, Feng Y (2024). Frozen ground change data set in the Tibetan Plateau (1961–2020). National Tibetan Plateau/Third Pole Environment Data Center.

[53]

Yu D, Wang Z, Yue C, Wang J (2025). Spatial modeling of brine level and salinity in the Qarhan Salt Lake using GIS and automated machine learning algorithms.J Hydrol Reg Stud, 58: 102195

[54]

Zou X, Guo G, Hou S (2025). Comprehensive warming assessment in a high-altitude region: observational evidence from Qinghai Province, northwest China.Earth and Space Science, 12(4): e2025EA004276

RIGHTS & PERMISSIONS

Higher Education Press

PDF (18367KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/