Flood prediction model based on decision trees
You ZUO
Water Resources and Hydropower Engineering ›› 2025, Vol. 56 ›› Issue (8) : 19 -31.
[Objective] Floods are natural disasters triggered by factors such as heavy rainfall, rapid snow and ice melt, and storm surges, often resulting in significant economic losses and severe disruption to daily life. Conventional flood prediction primarily relies on traditional hydrological method and experience-based statistical models. However, in areas lacking long-term and continuous hydrological monitoring data, alternative data-driven method for flood prediction are essential. [Methods] Machine learning algorithms based on decision trees, including Random Forest, XGBoost, and LightGBM, demonstrated excellent performance in classification and regression tasks due to their interpretability and strong functions, making them suitable for flood prediction. A dataset containing 50 000 records and 21 variables was used to evaluate the flood prediction performance of these three algorithms, namely Random Forest, XGBoost, and LightGBM. Their performance was assessed based on prediction accuracy and key variable identification, with the ROC-AUC curve used for comparative analysis. [Results] The result showed that all three models achieved high prediction accuracy. Among them, the XGBoost model exhibited the lowest mean squared error(0.000 186 2) and the highest coefficient of determination(0.925 2). Moreover, the LightGBM model achieved the highest AUC value(0.99) in the ROC-AUC curve. The Random Forest model underperformed the other two across all indicators. [Conclusion] The findings indicate that XGBoost delivers optimal performance for flood probability prediction with lowest prediction errors, while LightGBM is the optimal choice for binary classification tasks, such as predicting flood occurrence.
flood prediction / decision trees / Random Forest / XGBoost / LightGBM
/
| 〈 |
|
〉 |