Groundwater Potential mapping under data constraints: the critical roles of sample size and feature dimensionality
Zitao WANG , Jiansen LI , Yang LIU , Jinjun HAN
Reliable groundwater potential (GWP) mapping is essential for water resource planning, yet most studies emphasize algorithmic comparisons while neglecting how data constraints—specifically, limited sample sizes and high feature dimensionality—govern model reliability. This study examined these effects using Qinghai Province, China as a case study. A data set of 682721 grid cells and 20 environmental factors was established. To assess whether sample-size sensitivity generalizes across algorithms, a benchmark test was conducted with five machine learning methods: Random Forest (RF), Support Vector Machine, Multilayer Perceptron, Logistic Regression, and Gradient Boosting Decision Tree. Results show that sample size is the dominant factor across all tested architectures; models trained on fewer than 100 samples were universally unstable, yielding accuracies below 0.60. With RF as the representative model, performance rose sharply and converged near an accuracy of 0.82 and a macro F1 score of 0.78 once the training set reached 1000–2000 samples. Further analysis revealed that feature dimensionality played a secondary but notable role: retaining 8–10 key variables related to surface water connectivity, topography, and human activity improved accuracy and spatial coherence, whereas redundant features produced spatially fragmented predictions. Model interpretability was also data-dependent; SHAP values were unstable under small sample sizes but stabilized with sufficient data. Overall, this study demonstrates that sampling sufficiency and parsimonious feature selection are critical determinants of robust and interpretable GWP mapping, offering methodological guidance on data requirements that extends beyond algorithm choice.
Groundwater potential mapping / Sample size sensitivity / Feature selection / Model interpretability / Monte Carlo uncertainty analysis
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
Higher Education Press
/
| 〈 |
|
〉 |