Impacts of random negative training datasets on machine learning-based geologic hazard susceptibility assessment
Hao Cheng , Wei Hong , Zhen-kai Zhang , Zeng-lin Hong , Zi-yao Wang , Yu-xuan Dong
China Geology ›› 2025, Vol. 8 ›› Issue (4) : 676 -690.
Impacts of random negative training datasets on machine learning-based geologic hazard susceptibility assessment
This study investigated the impacts of random negative training datasets (NTDs) on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau, northern Shaanxi Province, China. Based on randomly generated 40 NTDs, the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve (AUC). Specifically, the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment, as well as the uncertainty introduced by the NTDs. A risk and return methodology was thus employed to quantify and mitigate the uncertainty, with log odds ratios used to characterize the susceptibility assessment levels. The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations. After the mean log odds ratios were converted into probability values, the final susceptibility map was plotted, which accounts for the uncertainty induced by random NTDs. The results indicate that the AUC values of the models ranged from 0.810 to 0.963, with an average of 0.852 and a standard deviation of 0.035, indicating encouraging prediction effects and certain uncertainty. The risk and return analysis reveals that low- risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments. Overall, this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models, aimed at improving their robustness and reliability. Additionally, by identifying low-risk and high-return areas, resource allocation for geologic hazard prevention and control can be optimized, thus ensuring that limited resources are directed toward the most effective prevention and control measures.
Landslides / Debris flows / Collapses / Ground fissures / Geologic hazard prevention and control engineering / Geologic hazard susceptibility assessment / Negative training dataset / Average spatial correlation / Random forest algorithm / Risk and return analysis / Geological survey engineering / Loess Plateau area
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
/
| 〈 |
|
〉 |