Impacts of random negative training datasets on machine learning-based geologic hazard susceptibility assessment

Hao Cheng , Wei Hong , Zhen-kai Zhang , Zeng-lin Hong , Zi-yao Wang , Yu-xuan Dong

China Geology ›› 2025, Vol. 8 ›› Issue (4) : 676 -690.

PDF (6957KB)
China Geology ›› 2025, Vol. 8 ›› Issue (4) :676 -690. DOI: 10.31035/cg2024094
Original Articles
research-article

Impacts of random negative training datasets on machine learning-based geologic hazard susceptibility assessment

Author information +
History +
PDF (6957KB)

Abstract

This study investigated the impacts of random negative training datasets (NTDs) on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau, northern Shaanxi Province, China. Based on randomly generated 40 NTDs, the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve (AUC). Specifically, the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment, as well as the uncertainty introduced by the NTDs. A risk and return methodology was thus employed to quantify and mitigate the uncertainty, with log odds ratios used to characterize the susceptibility assessment levels. The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations. After the mean log odds ratios were converted into probability values, the final susceptibility map was plotted, which accounts for the uncertainty induced by random NTDs. The results indicate that the AUC values of the models ranged from 0.810 to 0.963, with an average of 0.852 and a standard deviation of 0.035, indicating encouraging prediction effects and certain uncertainty. The risk and return analysis reveals that low- risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments. Overall, this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models, aimed at improving their robustness and reliability. Additionally, by identifying low-risk and high-return areas, resource allocation for geologic hazard prevention and control can be optimized, thus ensuring that limited resources are directed toward the most effective prevention and control measures.

Keywords

Landslides / Debris flows / Collapses / Ground fissures / Geologic hazard prevention and control engineering / Geologic hazard susceptibility assessment / Negative training dataset / Average spatial correlation / Random forest algorithm / Risk and return analysis / Geological survey engineering / Loess Plateau area

Cite this article

Download citation ▾
Hao Cheng, Wei Hong, Zhen-kai Zhang, Zeng-lin Hong, Zi-yao Wang, Yu-xuan Dong. Impacts of random negative training datasets on machine learning-based geologic hazard susceptibility assessment. China Geology, 2025, 8(4): 676-690 DOI:10.31035/cg2024094

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Hao Cheng, Wei Hong and Zhen-kai Zhan dong conceived of the presented idea. Hao Cheng, Zeng-lin Hong and Zi-yao Wang organized datasets interpretation work. Hao Cheng and Yu-xuan Dong wrote the manuscript. All authors discussed the results and contributed to the final manuscript.

Declaration of competing interest

The authors declare no conflicts of interest.

Acknowledgment

This study was supported by a project entitled Loess Plateau Region-Watershed-Slope Geological Hazard MultiScale Collaborative Intelligent Early Warning System of the National Key R&D Program of China (2022YFC3003404), a project of the Shaanxi Youth Science and Technology Star (2021KJXX-87), and public welfare geological survey projects of Shaanxi Institute of Geologic Survey (20180301, 201918, 202103, and 202413). The authors would like to extend their profound gratitude to the editors and anonymous reviewers of this manuscript for their constructive advice and corrections.

References

[1]

Abuzied SM, Mansour BMH. 2019. Geospatial hazard modeling for the delineation of flash flood-prone zones in wadi dahab basin, Egypt. Journal of Hydroinformatics, 21(1), 180-206. doi: 10.2166/hydro.2018.043.

[2]

Aida T, Silvia D. 2014. Quantitative techniques for financial risk assessment: a comparative approach using different risk measures and estimation methods. Procedia Economics and Finance, 8, 712-719. doi: 10.1016/S2212-5671(14)00149-X.

[3]

Alireza A, Biswajeet P, Luigi L. 2019. Comparative assessment using boosted regression trees, binary logistic regression, frequency ratio and numerical risk factor for gully erosion susceptibility modelling. Catena, 183 104223. doi: 10.1016/j.catena.2019.104223.

[4]

Bui DT, Tuan TA, Harald K, Biswajeet P, Inge R. 2016. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides, 13(2), 361-378. doi: 10.1007/s10346-015-0557-6.

[5]

Camminatiello I, D'Ambra A, Sarnacchiaro P. 2014. The association in a two-way contingency table through log odds ratio analysis: the case of sarno river pollution. Springerplus, 3(1), 384. doi: 10.1186/2193-1801-3-384.

[6]

Chang Z Du, Zhen, Zhang F, Huang F, Chen J, Li W, Guo Z. 2020. Landslide susceptibility prediction based on remote sensing images and gis: comparisons of supervised and unsupervised machine learning models. Remote Sensing, 12(3), 502. doi: 10.3390/rs12030502.

[7]

Chang Z, Huang J, Huang F, Bhuyan K, Sansar R M, Catani F. 2023. Uncertainty analysis of non-landslide sample selection in landslide susceptibility prediction using slope unit-based machine learning models. Gondwana Research, 117 307-320. doi: 10.1016/j.gr.2023.02.00.

[8]

Chen C, Oguchi T, Hayakawa YS, Saito H, Chen H, Lin G, Wei L, Chao Y. 2018. Sediment yield during typhoon events in relation to landslides, rainfall, and catchment areas in taiwan. Geomorphology, 303 540-548. doi: 10.1016/j.geomorph.2017.11.007.

[9]

Gan Z, Yue D, Gan R, Liu X, Pei X. 2004. Characteristic on rural settlements distribution and its land use in loess hill-gully area of northern shaanxi province. Journal of Shaanxi Normal University (Natural Science Edition), 32(3), 102-106. doi: 10.3969/j.issn.16724291.2004.03.029.

[10]

Gao R, Wang C, Liang Z, Han S, Li B. 2021. A research on susceptibility mapping of multiple geological hazards in yanzi river basin, China. Isprs International Journal of Geo-Information, 10(4), 218. doi: 10.3390/ijgi10040218.

[11]

Gao Y, Zhang Z, Xiong Y, Zuo R. 2016. Mapping mineral prospectivity for cu polymetallic mineralization in southwest Fujian Province, China. Ore Geology Reviews, 75 16-28. doi: 10.1016/j.oregeorev.2015.12.005.

[12]

Giordan D, Luzi G, Monserrat O, and Dematteis N. 2022. Remote sensing analysis of geologic hazards. Remote Sensing, 14(19), 4818. doi: 10.3390/rs14194818.

[13]

He Y, Zhang Y. 2022. Comparison of three mixed-effects models for mass movement susceptibility mapping based on incomplete inventory in China. Remote Sensing, 14(23), 6068. doi: 10.3390/rs14236068.

[14]

Hu Y, Li Z, Wang L, Chen B, Zhu W, Zhang S, Du J, Zhang X, Yang J, Zhou M, Liu Z. 2022. Rapid interpretation and analysis of the 2022 eruption of hunga tonga-hunga ha'apai volcano with integrated remote sensing techniques. Geomatics and Information Science of Wuhan University, 47(2), 242-251. doi: 10.13203/j.whugis20220050.

[15]

Huang F, Xiong H, Zhou X, Catani F, Huang J. 2024. Modelling uncertainties and sensitivity analysis of landslide susceptibility prediction under different environmental factor connection methods and machine learning models. Ksce Journal of Civil Engineering, 28(1), 45-62. doi: 10.1007/s12205-023-2430-9.

[16]

Huang F, Ye Z, Jiang S, Huang J, Chang Z, and Chen J. 2021. Uncertainty study of landslide susceptibility prediction considering the different attribute interval numbers of environmental factors and different data-based models. Catena, 202 105250. doi: 10.1016/j.catena.2021.105250.

[17]

Kavzoglu T, Emrehan KS, Colkesen I. 2015. Selecting optimal conditioning factors in shallow translational landslide susceptibility mapping using genetic algorithm. Engineering Geology, 192 101-112. doi: 10.1016/j.enggeo.2015.04.004.

[18]

Li T, Xie C, Xu C, Qi W, Huang Y, Li L. 2024. Automated machine learning for rainfall-induced landslide hazard mapping in luhe county of Guangdong Province, China. China Geology, 7(2), 315-329. doi: 10.31035/cg2024064.

[19]

Liang SY, Wang YX, Wang Y. 2010. Risk assessment of geological hazard in wudu area of longnan city, china. Applied Mechanics and Materials, 39 232-237. doi: 10.4028/www.scientific.net/AMM.39.232.

[20]

Liu J, Liu M, Tian H, Zhuang D, Zhang Z, Zhang W, Tang X, Deng X. 2005. Spatial and temporal patterns of china's cropland during 1990-2000: an analysis based on landsat tm data. Remote Sensing of Environment, 98(4), 442-456. doi: 10.1016/j.rse.2005.08.012.

[21]

Liu L, Gao H, Li Z. 2021. Landslide susceptibility assessment based on coupling of cf model and logistic regression model in Yongjia County. Periodical of Ocean University of China, 51(10), 121-129 doi: 10.16441/j.cnki.hdxb.20200247. (in Chinese).

[22]

Liu Y, Sun H, Gong J. 2018. Geologic hazard susceptibility and disaster risk mapping based on information value model for the Mianchi County, China. Iop Conference Series. Earth and Environmental Science, 199(2), 22039. doi: 10.1088/1755-1315/199/2/022039.

[23]

Matsunaga K, Gan Z. 2007. Geological and geomorphological conditions of mass movements in the loess plateau. Bulletin of Soil and Water Conservation, 27(01), 55-57 doi: 10.13961/j.cnki.stbctb.2007.01.013. (in Chinese).

[24]

Meng Z, Ma P, Peng J. 2021. Characteristics of loess landslides triggered by different factors in the Chinese loess plateau. Journal of Mountain Science, 18(12), 3218-3229. doi: 10.1007/s11629-021-6880-6.

[25]

Merghadi A, Yunus AP, Dou J, Whiteley J, Binh T, Tien BD, Avtar R, Abderrahmane B. 2020. Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth-Science Reviews, 207, 103225. doi: 10.1016/j.earscirev.2020.103225.

[26]

Nykanen V, Lahti I, Niiranen T, Korhonen K. 2015. Receiver operating characteristics (roc) as validation tool for prospectivity models - a magmatic ni-cu case study from the central lapland greenstone belt, northern finland. Ore Geology Reviews, 71, 853-860. doi: 10.1016/j.oregeorev.2014.09.007.

[27]

Peng J, Wang Q, Zhuang J, Leng Y, Fan Z, Wang S. 2020. Dynamic formation mechanism of landslide disaster on the loess plateau. Journal of Geomechanics, 26(5), 714-730. doi: 10.12090/j.issn.10066616.2020.26.05.059.

[28]

Shi Q, Wang M, Gan Z. 2007. Field survey on water erosion of the rural settlement in the loess hilly-gully area. Journal of Shaanxi Normal University (Natural Science Edition), 35(02), 103-107. doi: 10.15983/j.cnki.jsnu.2007.02.027.

[29]

Shu B, Chen Y, Amani-Beni M, Zhang R. 2022. Spatial distribution and influencing factors of mountainous geological disasters in southwest China: A fine-scale multi-type assessment. Frontiers in Environmental Science, 10, 1049333. doi: 10.3389/fenvs.2022.1049333.

[30]

Su H, Glenn F, Hu X, Wu S, Di B, Tan C. 2022. Predicting change in adaptation strategies of households to geological hazards in the longmenshan area, China using machine learning and Gis. Water, 14(7), 1023. doi: 10.3390/w14071023.

[31]

Sun P, Zhang M, Jia J, Cheng X, Zhu L, Xue Q, Wang J. 2022. Geohazards research and investigation in the loess regions of western China. Northwestern Geology, 55(3), 96-107 doi: 10.19751/j.cnki.61-1149/p.2022.03.007. (in Chinese).

[32]

Tazik E, Jahantab Z, Bakhtiari M, Rezaei A, Alavipanah S K. 2014. Landslide susceptibility mapping by combining the three methods fuzzy logic, frequency ratio and analytical hierarchy process in dozain basin. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2 (XL-2/W3), 267-272. doi: 10.5194/isprsarchives-XL-2-W3-267-2014.

[33]

Wang H, Zhang L, Yin K, Luo H, Li J. 2021. Landslide identification using machine learning. Geoscience Frontiers, 12(1), 351-364. doi: 10.1016/j.gsf.2020.02.012.

[34]

Wang Q, Guo H, Chen Y, Lin Q, Li H. 2013. Application of remote sensing for investigating mining geological hazards. International Journal of Digital Earth, 6(5), 449-468. doi: 10.1080/17538947.2011.629009.

[35]

Wang Z, Yin Z, Caers J, Zuo R. 2020. A monte carlo-based framework for risk-return analysis in mineral prospectivity mapping. Geoscience Frontiers, 11(6), 2297-2308. doi: 10.1016/j.gsf.2020.02.010.

[36]

Xu C, Ma S, Chen X. 2023. Comparison of the effects of earthquaketriggered landslide emergency hazard assessment models: A case study of the lushan earthquake with Mw 5.8 on June 1, 2022. Seismology and Geology, 45 (04), 896-913. doi: 10.3969/i.issn.0253-4967.2023.04.006.

[37]

Yang F, Wang Z, Zuo R, Sun S, Zhou B. 2023. Quantification of uncertainty associated with evidence layers in mineral prospectivity mapping using direct sampling and convolutional neural network. Natural Resources Research, 32(1), 79-98. doi: 10.1007/s11053-022-10144-6.

[38]

Yang K, Niu R, Song Y, Dong J, Zhang H, Chen J. 2024. Dynamic hazard assessment of rainfall-induced landslides using gradient boosting decision tree with google earth engine in Three Gorges reservoir area, China. Water, 16(12), 1638. doi: 10.3390/w16121638.

[39]

Yang Z, Qi W, Xu C, Shao X. 2024. Exploring deep learning for landslide mapping: A comprehensive review. China Geology, (7), 330-350. doi: 10.31035/cg2024032.

[40]

Yao J, Qin S, Qiao S, Che W, Chen Y, Su G, Miao Q. 2020. Assessment of landslide susceptibility combining deep learning with semisupervised learning in jiaohe county, Jilin Province, China. Applied Sciences-Basel, 10(16), 5640. doi: 10.3390/app10165640.

[41]

Yhip TM, Alagheband BMD. 2020. Statistical methods of credit risk analysis. The Practice of Lending, 351-381. doi: 10.1007/978-3-030-32197-0_8.

[42]

Youssef AM, Mahdi AM, Al-Katheri MM, Pouyan S, Pourghasemi HR. 2023. Multi-hazards (landslides, floods, and gully erosion) modeling and mapping using machine learning algorithms. Journal of African Earth Sciences, 197, 104788. doi: 10.1016/j.jafrearsci.2022.104788.

[43]

Zhang H, Zeng R, Zhang Y, Zhao S, Meng X, Li Y, Liu W, Meng X, Yang Y. 2022. Subsidence monitoring and influencing factor analysis of mountain excavation and valley infilling on the Chinese loess plateau: A case study of Yan'an new district. Engineering Geology, 297, 106482. doi: 10.1016/j.enggeo.2021.106482.

[44]

Zhao Z, Chen J. 2023. A robust discretization method of factor screening for landslide susceptibility mapping using convolution neural network, random forest, and logistic regression models. International Journal of Digital Earth, 16(1), 408-429. doi: 10.1080/17538947.2023.2174192.

[45]

Zuo R, Zhang Z, Zhang D, Carranza EJM, Wang H. 2015. Evaluation of uncertainty in mineral prospectivity mapping due to missing evidence: A case study with skarn-type fe deposits in southwestern Fujian Province, China. Ore Geology Reviews, 71, 502-515. doi: 10.1016/j.oregeorev.2014.09.024.

[46]

Zuo R, Wang Z. 2020. Effects of random negative training samples on mineral prospectivity mapping. Natural Resources Research, 29(6), 3443-3455. doi: 10.1007/s11053-020-09668-6.

AI Summary AI Mindmap
PDF (6957KB)

29

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/