This study aims to evaluate the effectiveness of machine learning techniques for predicting groundwater fluctuations in arid and semi-arid regions using data from the Gravity Recovery and Climate Experiment satellite mission. The primary objective is to develop accurate predictive models for groundwater level changes by leveraging the unique capabilities of GRACE satellite data in conjunction with advanced machine learning algorithms. Three widely-used machine learning models, namely DT, SVM and RF, were employed to analyze and model the relationship between GRACE satellite data and groundwater fluctuations in South Khorasan Province, Iran. The study utilized 151 months of GRACE data spanning from 2002 to 2017, which were correlated with piezometer well data available in the study area. The JPL model was selected based on its strong correlation (R2 = 0.9368) with the observed data. The machine learning models were trained and validated using a 70/30 split of the data, and their performance was evaluated using various statistical metrics, including RMSE, R2 and NSE. The results demonstrated the suitability of machine learning approaches for modeling groundwater fluctuations using GRACE satellite data. The DT model exhibited the best performance during the calibration stage, with an R2 value of 0.95, RMSE of 0.655, and NSE of 0.96. The SVM and RF models achieved R2 values of 0.79 and 0.65, and NSE values of 0.86 and 0.71, respectively. For the prediction stage, the DT model maintained its high efficiency, with an RMSE of 1.48, R2 of 0.87, and NSE of 0.90, indicating its robustness in predicting future groundwater fluctuations using GRACE data. The study highlights the potential of machine learning techniques, particularly Decision Trees, in conjunction with GRACE satellite data, for accurate prediction and monitoring of groundwater fluctuations in arid and semi-arid regions. The findings demonstrate the effectiveness of the DT model in capturing the complex relationships between GRACE data and groundwater dynamics, providing reliable predictions and insights for sustainable groundwater management strategies.
Mobin Eftekhari, Abbas Khashei-Siuki.
Evaluating machine learning methods for predicting groundwater fluctuations using GRACE satellite in arid and semi-arid regions.
J. Groundw. Sci. Eng., 2025, 13(1): 5-21 DOI:10.26599/JGSE.2025.9280035
Groundwater refers to water that exists at relatively significant depths and can freely move within the earth's subsurface through coarse pores, fissures, and fractures under the influence of gravity. This layer is considered as a water-bearing layer (Li et al. 2023). Groundwater is accessed through well drilling into these layers, highlighting the importance of groundwater exploitation (Haileslassie and Gebremedhin, 2015). Excessive use of groundwater leads to the phenomenon of drought. Drought is a continuous and sustained period during which water resources significantly decrease, resulting in reduced moisture in the soil and surface water bodies, which can occur in any climate (Gleeson et al. 2020). Although drought causes environmental damage and economic and social losses, it receives less attention compared to other phenomena (Wilhite and Glantz, 1985). Since ancient times, constructing and utilizing piezometric wells has been one of the common methods for investigating groundwater levels and monitoring their changes (Coelho et al. 2018). Unfortunately, due to the lack of precise spatial resolution, point-based measurements, and the costly nature of this method, it is not practical and cost-effective (Font-Capo et al. 2015). The launch of gravity-sensing satellites such as GRACE has opened a new chapter in science towards hydrological studies and estimating changes in groundwater (Meyer et al. 2019). Due to its suitable spatial and temporal coverage, alongside hydrological models, the GRACE satellite can be used as an alternative method to estimate changes in groundwater levels (Yang et al. 2014). This satellite has the capability to create monthly gravity fields induced by variations in water on Earth's surface, which can convert these variations into changes in total water on Earth through a process. By capturing changes in soil moisture, which plays a significant role in regional water changes, among other factors, the remaining amount can be attributed to groundwater (Wouters et al. 2014). Pragnaditya and colleagues (2021) utilized Support Vector Machine (SVM) model to predict rapid decline in Groundwater Level (GWL) based on Gravity Recovery and Climate Experiment (GRACE) gravity and water data, along with groundwater storage data based on land surface model and meteorological variables. Findings based on the data used indicate that GRACE-based estimations can successfully be used to predict GWLA (Groundwater Levels Anomaly) in most parts of the IGBM (Indus-Ganges-Brahmaputra-Meghna) basin. When additional meteorological variables are included as independent variables alongside GRACE GWSA (Groundwater Storage Anomaly) in the model, there are significant improvements in model performance. However, relatively poor performance is observed in intensified agricultural areas. This is primarily attributed to the coarse resolution of GRACE products, which have limitations in calculating groundwater pumping, highly heterogeneous groundwater extraction across local scales. Additionally, the conflicting predictive potential of observation wells at different aquifer depths relates to the fact that groundwater in the region is primarily extracted from deeper aquifer portions, and their representation in localized modeling. GWLAs using GRACE have limitations Kumar and Bhattacharjya applied the GRNN (Generalized Regression Neural Network) model to predict groundwater level fluctuations in the Uttarakhand state of India using GRACE data in 2021. The groundwater level in this state has decreased by 50% from 2007 to 2010, and the GRNN model for prediction demonstrates an acceptable level of accuracy as a supervised machine learning method (Kumar and Bhattacharjya. 2021). Sun in 2013 developed Artificial Neural Network (ANN) models for direct prediction of groundwater level changes using GRACE products and other general hydrometeorological data. As a feasibility study, group ANN models were used to predict monthly and seasonal changes in groundwater levels in several wells across different regions of America. The results indicated that while GRACE data play a modest yet significant role in the performance of ANN groups (Sun, 2013) In 2022, Ram employed unsupervised learning methods for the GRACE satellite to predict groundwater fluctuations. When these representations were used as inputs to groundwater prediction models, they reduced RMSE (Root Mean Square Error) by up to 19% and improved NSE (Nash-Sutcliffe Efficiency) by up to 8 times compared to traditional satellite inputs in three different spatial scales: National, state, and county. This improvement indicates that novel methods in monitoring groundwater fluctuations using the GRACE satellite provide acceptable results (Ram, 2022)
Seo and Lee (2021) focused on predicting changes in groundwater storage in space and time through the integration of multiple satellite datasets and deep learning models. In this study, two deep learning models, LSTM (Long Short-Term Memory) and CNN-LSTM (Convolutional Neural Network- Long Short-Term Memory), were developed using satellite data. These data included terrestrial water storage heterogeneity from GRACE satellites, precipitation from TRMM (Tropical Rainfall Measuring Mission) satellites, temperature and humidity from GLDAS (Global Land Data Assimilation System), and NDVI (Normalized Difference Vegetation Index) and MNDWI (Modified Normalized Difference Water Index) indices from Landsat 5 and 8 satellites. Comparisons with field measurements demonstrated that the CNN-LSTM model exhibits higher accuracy and significant improvements over the LSTM model, particularly in capturing spatiotemporal trends. Additionally, comparisons with NDVI, MNDWI, and TWSA (Terrestrial Water Storage Anomaly) data showed that changes in land cover and overall water storage are influential. The incorporation of satellite-derived parameters as training data for deep learning models substantially enhances model performance (Seo and Lee, 2021) South Khorasan Province in eastern Iran has a warm and arid climate. Water scarcity is considered a fundamental issue in this region, and recent droughts have imposed serious constraints on water resources. Being situated in a semi-arid to arid zone, South Khorasan Province is not exempt from this predicament. Factors such as surface water scarcity, decreasing average annual rainfall, occurrences of drought, etc., have laid the groundwork for human water needs to rely on groundwater (Eftekhari et al. 2019) Overuse of groundwater has led to a decline in groundwater levels, resulting in consequences such as land subsidence and the formation of sinkholes in various areas. Estimating the extent of groundwater and its fluctuations requires scientific studies, specialized investigations, and practical research in the field of water resource management (Khanlari et al. 2012) The conventional method for measuring groundwater level changes involves the use of piezometric wells, which, despite their suitable accuracy, come with limitations such as high costs, lack of comprehensive information, and time-consuming processes due to their limited study areas (Kalbus et al. 2006). In arid and semi-arid regions, groundwater fluctuations play a crucial role in understanding and managing groundwater resources, which provide a significant portion of the world's drinking water (36%) and irrigation water (42%) (Zhang et al. 2024). The dynamics of the water table and its response to recharge events provide insights into the sustainability of groundwater extraction rates, as the water table's response to infiltrating water can be used to estimate recharge, vital information for assessing the risk of groundwater resource depletion due to human activities (Gong et al. 2023). However, estimating recharge is challenging due to the high spatial heterogeneity and temporal variability of the relationship between precipitation and recharge (Gong et al. 2021). The Groundwater Level Fluctuation (GLF) method, which relates changes in water table elevations to recharge rates through the specific yield parameter, is a widely used approach for estimating recharge in arid and semi-arid regions (Gong et al. 2021). Understanding and quantifying the processes that control groundwater recharge fluctuations is critically important for effective resource management and securing water supplies for human needs and ecosystems in drylands, as groundwater recharge is one of the least understood components of groundwater systems due to its wide spatial and temporal variability, making direct measurement difficult (Zhang et al. 2024). One of the fundamental solutions is remote sensing techniques, which serve as an alternative method to save costs and achieve more accurate results by researchers. While humans play a significant role in gathering and interpreting terrestrial data, remote sensing methods are primarily executed by sensors (Yao et al. 2019). The primary objective of this research is to develop and evaluate machine learning models that leverage GRACE satellite data for accurate prediction of groundwater level fluctuations in South Khorasan Province, with a focus on long-term forecasting capabilities. This study involves the use of random forest, support vector machine, and decision tree models, and it compares the performance of these models with each other. So far, there has not been sufficient attention to the simultaneous examination of these methods using the GRACE satellite in the study area.
1 Overview of machine learning models
This study employs three widely-used and powerful Machine Learning (ML) models for predicting groundwater level fluctuations: Support Vector Machines (SVMs), Random Forests (RFs), and Decision Trees (DTs). These models were carefully selected due to their proven ability to effectively handle and learn from the complex spatiotemporal data generated by the GRACE satellite mission, as well as their diverse approaches to modeling and learning from data, which allows for a robust comparison and evaluation of their respective strengths and limitations. Support Vector Machines (SVMs), introduced by Vladimir Vapnik and colleagues in the 1990s, have gained significant popularity for their exceptional performance in both classification and regression tasks (Joachims. 2012). The fundamental principle behind SVMs is to find an optimal hyperplane that separates data points of different classes (in classification) or approximates the relationship between input features and the target variable (in regression) with maximum margin. This approach allows SVMs to effectively model non-linear relationships and handle high-dimensional data, making them well-suited for applications involving remote sensing datasets like GRACE, where the number of input features can be large. Random Forests (RFs), on the other hand, are a powerful ensemble learning method that combines multiple decision trees to improve accuracy and robustness in predictions (Fawagreh et al. 2014). RFs evolved from the concept of Classification and Regression Trees (CART) and leverage the strengths of individual decision trees while mitigating their weaknesses through an ensemble approach. By constructing a large number of decision trees, each trained on a different subset of the data and considering a random subset of features for splitting, RFs can effectively capture complex, non-linear patterns and reduce the risk of overfitting, leading to more reliable predictions. Decision Trees (DTs) form the building blocks of Random Forests and are widely used for both classification and regression tasks due to their interpretability and ability to handle non-linear relationships (Rivera-Lopez et al. 2022). DTs operate by recursively partitioning the input data into smaller subsets based on a series of decisions or rules, ultimately forming a tree-like structure. The prediction process involves traversing the tree from the root node to the leaf nodes, following the decisions or rules defined at each internal node based on the input feature values. This intuitive and easily interpretable structure makes DTs valuable for understanding the relationships between input features and the target variable. By employing these three machine learning models – SVMs, RFs, and DTs – this study leverages their diverse strengths and learning approaches to comprehensively analyze and model the complex relationships between GRACE satellite data, environmental factors, and groundwater level fluctuations. The comparison of these models' performances and their respective advantages and limitations provide valuable insights into the most effective strategies for predicting and monitoring groundwater resources using remotely sensed data.
1.1 Support Vector Machines
Support Vector Machines (SVMs) employ a mathematical approach to predicting groundwater fluctuations, distinguishing them from decision trees. They find a dividing line (hyperplane) in a high-dimensional space that best separates different groundwater level values. This allows SVMs to capture complex relationships between various factors influencing groundwater, such as GRACE data, time, and location. Kernel functions are used to create this hyperplane, making SVMs particularly adept at handling these intricate relationships (Sahour et al. 2022). Furthermore, SVMs are well-suited for analyzing large, complex datasets like those obtained from remote sensing. They are robust to high-dimensional data with many variables and can prevent overfitting the model through regularization techniques. While not as easy to interpret as decision trees, SVMs can still provide insights into the importance of different factors influencing groundwater levels through feature ranking or selection (Gilbert et al. 2023) This study utilizes SVMs to develop accurate models for groundwater monitoring and management. Their ability to handle complex relationships, high-dimensional data, and prevent overfitting makes SVMs a valuable tool for effectively analyzing GRACE satellite data and other relevant variables to predict groundwater fluctuations.
1.2 Random forests
Random forests, a powerful ensemble learning method using multiple decision trees, are well-suited for predicting groundwater fluctuations from GRACE satellite data (Schelter, 2021). By combining numerous decision trees, they achieve higher accuracy and robustness than individual trees. This approach reduces the impact of biases and overfitting, leading to more reliable predictions (Seni and Elder, 2010). Random forests excel at capturing complex relationships within groundwater systems. They can effectively model how various factors like GRACE data, time, and location influence groundwater fluctuations. Additionally, they identify the most influential data points within the GRACE data, guiding future data collection and model development. Furthermore, their inherent robustness to noise and outliers makes them ideal for handling real-world satellite data with potential inconsistencies. Finally, their ability to be parallelized allows for efficient training on large datasets, making them a powerful tool for groundwater monitoring and management (Ali et al. 2021).
1.3 Decision trees
Decision trees are a machine learning method used here to predict groundwater fluctuations from GRACE satellite data. They work by splitting the data based on a series of rules, building a tree-like structure. This process continues until the data is sufficiently categorized, leading to final predictions based on the specific features that reached each terminal node (Maimon and Rokach. 2014). Decision trees offer several advantages for this task. They are easy to understand, which is important for stakeholders to grasp the reasoning behind predictions. Additionally, they can capture complex relationships between various factors impacting groundwater and can identify the most relevant data points for accurate predictions. Furthermore, they are resilient to outliers and missing data, making them suitable for working with real-world satellite data. By leveraging decision trees, this study aims to develop insightful and robust models for groundwater fluctuation prediction, aiding in groundwater monitoring and management (Liu et al. 2022).
2 Materials and methods
2.1 Case study
South Khorasan Province, spanning 150,800 square kilometers, is located in eastern Iran, characterized by a dry and semi-arid climate.The region is characterized by a complex geological setting, with sedimentary rocks of various ages and lithologies underlying the surface. The primary aquifers in the region are hosted within the fractured and porous formations of sandstone, limestone, and conglomerate units. These aquifers are generally semi-confined to confined, with varying degrees of hydraulic connectivity and recharge mechanisms. The depth to the water table ranges from a few meters to several hundred meters below the surface, depending on the location and the specific aquifer unit. The distribution and burial depth of these aquifers play a crucial role in determining groundwater flow patterns and availability within the province its center is the city of Birjand. It shares borders with Razavi Khorasan Province to the north, Afghanistan to the east, Yazd, Isfahan, and Semnan provinces to the west, and Kerman and Sistan and Baluchestan provinces to the south. In terms of hydrological divisions, South Khorasan Province includes parts of four major basins: The Loot Desert, Hamoun-e Hirmand, Khaf Salt Lake, and Central Desert. The province comprises a total of 35 study areas, including 9 open study areas, 18 restricted study areas, and 8 critically restricted study areas. This study incorporated data from 200 piezometers distributed across South Khorasan Province, whose locations are illustrated in Fig. 1 to provide spatial context for the groundwater monitoring network (Rajaee et al. 2011)
2.2 Methodology
One of the methods for studying water resources is through the measurement of the Earth's gravity field and its variations. Changes in the Earth's gravity field are similar to the changes on a large body and their effects on the surrounding environment. Changes in the distribution of the Earth's mass can lead to variations in the Earth's gravity, which can be investigated using satellite and ground measurement equipment (Swenson et al. 2002). Researchers in space sciences have been able to obtain comprehensive information about changes in groundwater reserves as a result of changes in the Earth's gravity using gravity-sensing satellites (Feng et al. 2018). GRACE satellites are among the gravity-sensing satellites that have shown high sensitivity to changes in water levels and establish a connection between changes in water levels and changes in the Earth's gravity field by providing an estimation of the Earth's gravity field (Chen et al. 2022). The Gravity Recovery and Climate Experiment (GRACE) project provides regular and monthly estimations of the Earth's gravity field in the form of harmonic geopotential models. GRACE satellite data is a new and valuable tool for monitoring groundwater. These satellites are currently the only remote sensing satellites capable of monitoring changes in groundwater levels. The primary application of GRACE satellites is to determine hydrological changes by continuously measuring changes in water stored in water bodies, soil, surface reservoirs, and snow with an accuracy of a few millimeters in water height over spatial dimensions of 400 kilometers on the Earth's surface. In this process, factors causing mass displacement in the aforementioned spatial dimensions on or within the Earth are monitored by GRACE over a monthly period. GRACE, through observations of Total Water Storage (TWS) changes and with the help of the Global Land Data Assimilation System (GLDAS) hydrological system analysis and simulation models, can estimate changes in groundwater. This model calculates factors such as soil moisture, snow water equivalent, and water stored in plant canopies. Therefore, after converting the outputs of the hydrological model into spherical harmonic analysis, the total water storage-related changes are subtracted, and the residual effects in GRACE observations depict changes in groundwater (Afraza et al. 2021). Changes in the distribution of mass on Earth lead to variations in its gravity field. Measuring changes in the gravity field can identify changes in mass distribution and determine increases or decreases in mass in a region. To examine the Earth's gravity field over different time intervals, the GRACE gravity-sensing satellite can be utilized. The process of estimating groundwater changes using GRACE data involves several key steps. Utilization of the GLDAS hydrological model to determine the optimal state of the Earth's surface and quantify hydrological effects. Removal of these hydrological effects by calculating and subtracting spherical harmonic coefficients derived from the GLDAS model outputs. Computation of the difference between the residual spherical harmonic coefficients (after hydrological effect removal) and the coefficients obtained directly from GRACE. Application of a wavelet transform to filter the resulting signal, yielding an estimate of groundwater changes. Finally, the changes derived from these monthly harmonic coefficients are transformed into groundwater level fluctuations specific to the study area, as described by Equation 1.
Where: ∆σ (θ, λ) represents the change in surface mass density (expressed in kg/m2) at a given colatitude θ and longitude λ. This term effectively quantifies the variation in water thickness, which we use as a proxy for groundwater level changes. In this relation, $ {\rho }_{ave} $=5,517 Kg/m3 represents the average volumetric mass of the Earth's surface, kn denotes the Love numbers, $ \Delta {J}_{nm} $ and $ \Delta {K}_{nm} $ represent the monthly variations of spherical harmonic coefficients, and $ {\bar{P}}_{nm} $ signifies the normalized Legendre functions.
To improve the estimation of the Earth's gravity field, the coefficient Wn can be introduced into Equation 1.
In Equation 3, the value of r represents the averaging radius, and $ a $ denotes the average radius of the Earth. Equation 4 is a recursive equation for calculating the averaging kernel W, which varies with changes in the averaging radius. The response in Equation 2 is the surface density anomaly, obtained by dividing the water density by the vertical water height fluctuations for the study area. One of the most critical issues in GRACE data is the accumulation of data due to the 89.5-degree inclination angle of the satellite's orbit around the Earth, leading to data accumulation in the north-south direction and causing spatial correlation among GRACE data points (Li et al. 2018).
In geographical maps, these correlations appear as bands in the north-south direction. Wavelet analysis is a suitable tool for filtering and localizing signals and investigating the effects of local and temporal variations in the gravity field. Since GRACE can observe the total mass changes, such as changes in water reservoirs, if the goal is to investigate changes in groundwater, it is essential to remove hydrological effects. The Earth system analysis and simulation system in global coverage is a combination of satellite data and ground reference observations, created based on advanced drought models and advanced data fusion techniques, to determine the optimal condition of the Earth's surface (Springer et al. 2017). The processing of gravity data from the GRACE satellite in the Google Earth Engine processing engine has been carried out using three algorithms provided by the German Research Centre for Geosciences (GFZ), the Jet Propulsion Laboratory (JPL), and the University of Texas Center for Space Research (CSR) organizations. Furthermore, in the last two decades, new-generation machine learning techniques for data mining have been significantly developed.
2.2.1 Data preparation for machine learning models
The integration of GRACE-derived data with ground-based Piezometric measurements and their structuring for Machine Learning (ML) analysis is a critical step in our methodology. Here, we outline the data preparation process and the resulting input structures for our ML models. From the GRACE mission, we obtained monthly gravity field solutions, processed to estimate changes in groundwater storage (∆σ). These estimates have a spatial resolution of 1° × 1° (approximately 111 km at the equator) and a temporal resolution of one month. For each grid cell covering South Khorasan Province, we extracted a time series of ∆σ values from 2002 to 2017, resulting in 151 data points per cell (Frappart and Ramillien,2018) . Data from the 200 piezometers distributed across the province were aggregated to match the spatial and temporal resolution of the GRACE data. For each 1° × 1° grid cell, we calculated the mean monthly groundwater level from all piezometers within that cell. This aggregation helps reduce noise and aligns the ground truth data with the GRACE observations. We then paired each GRACE-derived ∆σ value with its corresponding aggregated piezometric measurement, creating a dataset where each data point consists of the GRACE-derived ∆σ (change in groundwater storage) as the input feature and the change in groundwater level from piezometric data as the target variable. Both input and target variables were normalized using min-max scaling to the range [0, 1] to facilitate ML model training. To capture the temporal dependencies in groundwater dynamics, we created time-lagged features. For each time step t, we included ∆σ values from t-1, t-2, and t-3 (the previous three months) as additional input features. This structure allows the ML models to learn from recent historical patterns when making predictions (Wang and Gupta, 2024). The resulting dataset was divided into training (70%) and testing (30%) subsets. The split was performed chronologically, with earlier data used for training and later data for testing, to simulate real-world forecasting conditions (King et al. 2022). For each ML model (Decision Trees, Random Forests, and Support Vector Machines), the input data structure consists of ∆σ(t), ∆σ(t-1), ∆σ(t-2), and ∆σ(t-3) as features, and the change in groundwater level at time t as the target. This structure provides a rolling window of GRACE-derived information to predict contemporaneous changes in groundwater levels. By incorporating time-lagged features, we enable the models to capture both the magnitude and trend of groundwater storage changes, crucial for accurate predictions (Roy et al. 2023). These machines can be used to discover and extract knowledge from databases as well as create predictive models (Shouval et al. 2014) The primary goal of these machine learning models is to find a useful approximate function that demonstrates the relationship between input variables and desired outcomes (Wang et al. 2022). In this study, three machine learning models, including Random Forest (RF), Support Vector Machine (SVM), and Decision Tree (DT), were used to predict fluctuations in underground water levels in the study area. Support Vector Machine (SVM) is a novel technology in the field of machine learning that utilizes non-parametric and semi-supervised statistical methods (Dong et al. 2021). This method was introduced by Vapnik and his colleagues in 1992 based on the theory of statistical learning. In the years following, they introduced the theory of optimal hyperplane as a linear classifier and introduced non-linear classifiers using kernel functions (Yaman and Cengiz, 2021). The fundamental principles of Support Vector Machine, which are now recognized as a valid method, can be traced back to the work of Vapnik and his colleagues, and ultimately, the extension of Support Vector Machine based on regression was achieved by Vapnik in 1995 (Sansone et al. 2013). This method uses Support Vector Machine classification models to solve classification problems where data belong to different classes, and Support Vector Machine regression models are used to solve prediction problems (Bhavsar and Panchal, 2012). The main feature of this method is its high ability to use fewer training samples while achieving higher accuracy compared to other methods (Shao and Lunetta, 2012). Random Forest method is a novel and powerful approach in the field of data mining that has brought significant improvements in this area. The Random Forest technique has evolved as an expanded model of the Classification and Regression Tree (CART) method (Ziegler and König, 2014). In other words, Random Forest is a tree-based learning method. This method is capable of learning complex patterns and considering nonlinear relationships between explanatory and dependent variables (Louppe, 2014). The training process of the tree starts with a repeatable process that begins at the root node and ends at the terminal nodes (leaves). Then, a new sample is selected, and another tree is trained. Once the tree is complete, a set of decision rules is extracted to estimate new data (Genuer et al. 2020). The Random Forest method offers several advantages over other methods such as high prediction accuracy, ability to learn nonlinear relationships, high capability in determining important variables in prediction, and non-parametric nature (Matin et al. 2018). The Decision Tree algorithm is one of the strongest and most widely used machine learning algorithms utilized in both classification and regression tasks (Charbuty and Abdulazeez, 2021). This algorithm employs a tree structure to represent decisions and their combinations based on input features and is easily interpretable (Barros et al. 2011). The tree structure consists of root nodes, internal nodes, and leaves, which partition the data based on various features (Patel and Prajapati, 2018). Each decision node in the decision tree is determined based on a selected feature, and the decision-making process starts from the root and continues by recursively splitting the data into sub-nodes (Zhu et al. 2018). The training process of the decision tree involves selecting the optimal feature, dividing the data based on this feature, and repeating this process to create leaf nodes (Rai et al. 2016). Although this algorithm is easily interpretable and performs well on clean data, it may suffer from overfitting and sensitivity to small changes in the data (Hilario et al. 2006). Overall, the Decision Tree algorithm is recognized as a powerful tool in decision-making for complex problems. Machine learning modeling using data obtained from the GRACE satellite from 2003 to 2017 has been conducted in the Google Earth Engine environment. To identify the most suitable GRACE satellite data product for our study area, we evaluated the correlation between ground subsidence measurements and the outputs from three different processing algorithms: JPL, CSR, and GFZ. The algorithm exhibiting the highest correlation with observed subsidence was selected for further analysis. Validation was performed using data from 2013 to 2016 to select the optimal prediction model. Then, by selecting the sample pattern, the prediction of future groundwater fluctuations for the next 10 years was addressed. In this study, machine learning methods were implemented in the Python environment. During the modeling process using these methods, the data were divided into two categories: Training data and testing data. In this research, 70% of the total data were allocated for model training, and the remaining 30% were used as testing data for the model. Random selection of training and testing data was performed using software, ensuring that the data from each stage were not used in the other stage.
2.2.2 Data processing and filtering
The raw data from the GRACE satellite mission require significant processing and filtering to extract meaningful information about groundwater storage changes. The key steps in this process involve spherical harmonic analysis, removal of non-hydrological signals, filtering and smoothing, leakage correction, and validation and calibration (Humphrey et al. 2023). GRACE satellite measurements of Earth's gravitational field variations are represented as changes in spherical harmonic coefficients. These coefficients describe the gravitational potential field using a series of spherical harmonic functions. Processing GRACE data involves converting raw measurements into monthly sets of spherical harmonic coefficients, which can then be analyzed to obtain information about mass redistribution on Earth's surface (Chen, 2019). GRACE data reflect changes in Earth's gravity field caused by various factors, including groundwater storage, surface water, soil moisture, and glacial ice mass. To isolate groundwater signals, contributions from other sources must be removed. This is done by incorporating data from complementary models and observations, such as the Global Land Data Assimilation System (GLDAS), which provides estimates of soil moisture, snow cover, and surface water variations (Moore and Fisher, 2012). GRACE data are noisy due to measurement errors, atmospheric disturbances, and limited spatial resolution. To reduce noise and enhance the signal-to-noise ratio, various filtering and smoothing techniques are applied, such as Gaussian smoothing, de-striping filters, and wavelet-based filters. The choice of filtering method depends on the study area's characteristics and analysis objectives (Werth et al. 2009). The limited spatial resolution of GRACE data can cause signal leakage from adjacent regions, leading to potential errors in estimating groundwater storage changes. Leakage correction techniques, involving scaling or gain factors, adjust GRACE data based on expected signal strength and leakage patterns in the study area (Longuevergne et al. 2010). To ensure the reliability and accuracy of processed GRACE data, validation and calibration are essential. This involves comparing GRACE-derived groundwater storage changes with ground-based observations, such as well measurements or other in-situ data sources. Validation helps assess GRACE data performance and identify potential biases or discrepancies, which can then be addressed through further processing refinements or additional data sources (Frappart and Ramillien, 2018). Data processing and filtering are crucial for extracting reliable information about groundwater storage changes from GRACE satellite data. These steps mitigate the inherent limitations and uncertainties of satellite measurements, enabling more accurate and meaningful interpretations for groundwater monitoring and management (Chen et al. 2016).
2.2.3 Model training and validation
The success of machine learning models in predicting groundwater fluctuations relies on proper training and validation procedures (Singha et al. 2021). This section outlines the process of training and validating the decision tree, random forest, and Support Vector Machine (SVM) models used in this study, including data splitting, evaluation metrics, and relevant techniques. Before training the models, the available data were split into two subsets: a training set and a testing set. The training set was used to train the models, allowing them to learn the patterns and relationships between the input features (e.g. GRACE satellite data, time, spatial coordinates) and the target variable (groundwater fluctuations). The testing set was held out during the training process and used to evaluate the performance of the trained models on unseen data. In this study, the data were split into 70% for training and 30% for testing using stratified sampling techniques to ensure that both sets were representative of the entire data distribution (Seidu et al. 2023). The training process for each machine learning algorithm involved optimizing the model's parameters to achieve the best possible performance. For decision trees, recursive partitioning algorithms split the data based on the most informative features at each node. Parameters such as maximum tree depth, minimum samples per leaf, and splitting criteria were tuned to control tree complexity and prevent overfitting. Random forest models were trained by constructing an ensemble of decision trees, each built on a bootstrap sample of the training data and a random subset of features. Parameters like the number of trees in the ensemble, the maximum depth of individual trees, and the number of features considered for splitting were optimized to balance model complexity and performance. For SVMs, the training involved finding the optimal hyperplane that separates the data points with the maximum margin. Key parameters such as the kernel function (e.g. linear, polynomial, or radial basis function), the regularization parameter (C), and kernel parameters (e.g. gamma for RBF kernel) were tuned for optimal performance (Saputra et al. 2024). To evaluate the performance of the trained models and select the most suitable one for groundwater fluctuation prediction, various evaluation metrics were employed. Root Mean Square Error (RMSE) measures the average magnitude of the errors between predicted and observed groundwater fluctuation values, with lower RMSE values indicating better performance. The Coefficient of Determination (R2) represents the proportion of variance in the groundwater fluctuation data explained by the model, with higher R² values indicating a better fit. The Nash-Sutcliffe Efficiency (NSE) quantifies predictive performance relative to a baseline scenario, with values closer to 1 indicating better performance. Additionally, techniques such as cross-validation and ensemble methods were used to ensure the robustness and generalization capabilities of the trained models. Cross-validation involves partitioning the training data into multiple subsets and iteratively training and evaluating the model on different subsets, providing a more realistic estimate of the model's performance on unseen data (Liu et al. 2021). By following a rigorous training and validation process and evaluating the models using multiple performance metrics, the most suitable algorithm for predicting groundwater fluctuations based on GRACE satellite data and other relevant input features was selected. This ensures reliable and accurate predictions for groundwater monitoring and management applications.
2.3 Assessing the efficiency of methods
To evaluate the models used in the simulation, several statistical criteria were employed. These criteria include Root Mean Square Error (RMSE), Nash-Sutcliffe Efficiency (NSE), and the coefficient of correlation (R2). These metrics are defined by the following Equation, respectively (Honarbakhsh et al. 2019).
In Equations 7 and 8, n represents the total number of points under consideration, and O and P respectively denote the mean of observed values and the mean of predicted values. Additionally, Oi and Pi represent the observed and predicted values, respectively, at point i. In Equation 9, OBSi signifies the observed value, SIMi represents the predicted value, and OBSbar denotes the mean of observed values. Fig. 2 illustrates the stages of the method.
3 Results and discussion
The changes in the groundwater level of South Khorasan Province in centimeters have been determined using GRACE satellite data and three algorithms: CSR, GFZ, and JPL, the results of which are presented in Fig. 3. The time span of the changes corresponds to the years 2003 to 2017. Additionally, the average piezometers for the entire area are also shown in the figure.
To examine the algorithms provided by the GRACE satellite, a linear regression was conducted between the changes obtained from the data of this satellite with all three algorithms: GFZ, JPL, CSR, and the piezometric wells from 2003 to 2017, which were obtained through the Regional Water Company of South Khorasan Province. The results are presented in Figs. 4 to 6.
In examining the results, the linear correlation coefficients between the JPL, GFZ, and CSR models are 0.9065, 0.9368, and 0.8951, respectively. These coefficients indicate that the JPL model is more suitable for monitoring groundwater fluctuations in South Khorasan Province.
3.1 Calibration (Validation)
The validation of three models was conducted to select the appropriate model for predicting groundwater fluctuations. This was done by performing validation for the years 2013 to 2016, and the selected machine learning models, including DT, RF, and SVM, were examined to determine which one provides better performance and features for prediction (Fig. 7).
The correlation between observational data and the model was also examined for all three models, and the values of error indices and model performance were obtained.
The results of predicting groundwater fluctuations using three machine learning methods are presented in the table below. The DT, SVM, and RF models have RMSE values of 0.655, 1.304, and 1.896, respectively, indicating better performance in estimating groundwater fluctuations. However, R2 values also confirm the results. The NSE coefficient indicates the predictive performance of the selected models, where the closer this coefficient is to 1, the more suitable the model is for prediction.
According to Table 1, the NSE coefficients for the three models DT, SVM, and RF are 0.96, 0.86, and 0.71, respectively. This indicates that the DT model performs better in predicting groundwater levels in the region.
By selecting the decision tree model as the machine learning model for predicting groundwater levels in South Khorasan province, the prediction of groundwater fluctuations in the province until 2028 has been addressed, with predicted values averaged over 12 months. Additionally, performance and error indices have been calculated. Since the Nash-Sutcliffe coefficient exceeded 0.7, it indicates that the model and its predictions have the best performance (Fig. 11).
Furthermore, the 95% confidence interval has been determined both as the upper and lower limits, indicating the state of uncertainty and the maximum to minimum range of prediction. This can serve as a tool for analyzing uncertainty regarding issues affecting groundwater fluctuations such as climate change, precipitation, etc. The prediction results indicate that from 2017 to 2020, we have faced a decrease in groundwater levels, which signifies groundwater management. However, from 2020 to 2028, we observe a significant decline in groundwater levels in the region.
Table 2 presents the prediction evaluation results for the decision tree model. With a correlation coefficient of 0.87 and a Nash-Sutcliffe Efficiency (NSE) of 0.9, the decision tree model exhibits satisfactory performance in predicting groundwater fluctuations in South Khorasan province. Results show that demonstrates the effectiveness of the decision tree (DT) model in accurately predicting groundwater level fluctuations using GRACE satellite data. The superior performance of the DT model, as evidenced by its high correlation coefficient (R2 = 0.87) and Nash-Sutcliffe Efficiency (NSE = 0.9), highlights its capability to capture the complex spatiotemporal dynamics governing groundwater systems. The DT model's ability to handle non-linear relationships and its inherent interpretability are particularly valuable in the context of groundwater modeling. Groundwater fluctuations are influenced by a myriad of interacting factors, including precipitation patterns, land use, aquifer characteristics, and anthropogenic activities. The decision tree algorithm's recursive partitioning approach effectively identifies and incorporates these intricate relationships, enabling accurate predictions even in the presence of nonlinearities. Furthermore, the hierarchical structure of the DT model offers valuable insights into the relative importance of different input features. By analyzing the feature importance rankings, we can gain a deeper understanding of the key drivers influencing groundwater dynamics within our study area. This knowledge can inform targeted management strategies, such as prioritizing conservation efforts in areas with high sensitivity to specific factors or implementing focused monitoring programs for critical variables.
3.2 Practical implications for groundwater management in arid regions
Our research holds significant practical implications for groundwater management strategies in arid and semi-arid regions, where groundwater resources are critical for sustaining agricultural, industrial, and domestic activities. The accurate prediction of groundwater fluctuations facilitated by our approach can inform targeted and proactive management interventions tailored to the specific challenges faced in these water-stressed environments. In regions like South Khorasan Province, where surface water resources are scarce and precipitation patterns are highly variable, groundwater serves as a vital buffer against drought and water scarcity. The ability to forecast groundwater levels with high accuracy enables water managers to anticipate potential shortfalls and implement timely mitigation measures, such as promoting water conservation practices, implementing groundwater extraction limits, or exploring alternative water sources. Our study's findings can contribute to the development of early warning systems for groundwater depletion, which are particularly crucial in arid regions. By leveraging the spatiotemporal predictions generated by our decision tree model, authorities can identify areas at risk of rapid groundwater decline and prioritize intervention efforts accordingly. This targeted approach can optimize the allocation of resources and ensure that critical areas receive the necessary attention before reaching unsustainable groundwater extraction levels. Furthermore, our research can inform the design and optimization of groundwater monitoring networks in arid regions. The spatial patterns revealed by our model can guide the strategic placement of piezometers or other monitoring infrastructure, ensuring that areas with high groundwater sensitivity or rapid fluctuations are adequately covered. This data-driven approach can enhance the cost-effectiveness and efficiency of monitoring programs, which are often resource-constrained in arid environments. Importantly, our study contributes to the broader goal of achieving sustainable groundwater management in arid regions. By providing accurate and reliable forecasts of groundwater dynamics, our approach can support the development of long-term management plans that balance the needs of various stakeholders (e.g. agriculture, industry, domestic use) while ensuring the preservation of groundwater resources for future generations. Integrating our research findings with existing groundwater management frameworks, such as Integrated Water Resources Management (IWRM) or Managed Aquifer Recharge (MAR) strategies, can further enhance their effectiveness in arid regions. Our predictive capabilities can inform the design and implementation of aquifer recharge schemes, optimizing the timing and locations of recharge efforts based on anticipated groundwater fluctuations. Additionally, our research can contribute to raising awareness among local communities and policymakers about the importance of sustainable groundwater management in arid regions. By providing clear and data-driven insights into groundwater dynamics, our findings can facilitate informed decision-making and foster collaboration between various stakeholders, promoting a shared understanding of the challenges and the urgency of implementing effective management strategies. Overall, our study represents a significant step towards leveraging advanced data analytics and machine learning techniques to address the pressing challenges of groundwater management in arid regions. By combining the strengths of satellite remote sensing and cutting-edge modeling approaches, our research provides a powerful tool for informed decision-making and contributes to the broader goal of ensuring long-term water security in these vulnerable environments.
4 Conclusion
This study investigated the effectiveness of machine learning techniques, particularly the Decision Tree (DT) model, in leveraging GRACE satellite data for accurate prediction of groundwater level fluctuations. Our analysis revealed the DT model's superior performance (R2 = 0.87, NSE = 0.9) in capturing the complex spatiotemporal dynamics of groundwater systems compared to other models. The DT model's ability to handle non-linear relationships and provide interpretability makes it well-suited for groundwater modeling, where numerous interacting factors influence fluctuations. Additionally, the model's structure offers insights into the relative importance of input features, informing targeted management strategies and focused monitoring efforts. Our findings highlight the potential of integrating GRACE data and machine learning for long-term groundwater forecasting. This facilitates proactive decision-making and interventions to mitigate declining groundwater levels or droughts. The spatiotemporal predictions can guide the optimization of monitoring networks and resource allocation in vulnerable areas. The implications extend beyond the study area, contributing to sustainable groundwater management in arid regions. Our approach can inform early warning systems, aquifer recharge initiatives, demand-side management policies, and the exploration of alternative water sources. Traditionally, groundwater level measurement relied on observation wells. However, these methods have limitations such as uneven distribution across regions, high cost, and time-consuming nature. GRACE satellite data with high spatial resolution offers a valuable alternative for monitoring groundwater changes. This study used DT, SVM, and RF models for prediction, with DT exhibiting the best performance (NSE = 0.9). The successful application of machine learning highlights its potential for efficient groundwater fluctuation prediction. While this research presents a significant step forward, future avenues include integrating additional data sources, exploring ensemble and hybrid modeling approaches, addressing uncertainty quantification, expanding spatial and temporal scales, and incorporating stakeholder perspectives. Addressing these directions can further refine our understanding of groundwater dynamics and facilitate the translation of scientific findings into actionable strategies. Ultimately, our research contributes to achieving long-term water security in arid regions by providing accurate and reliable forecasts of groundwater dynamics. By combining satellite remote sensing and advanced modeling techniques, our study offers a powerful tool for informed decision-making, fostering collaboration between researchers, policymakers, and local communities in preserving this vital resource for future generations.
Afraz A, Eftekhari M, Akbari M, et al. 2021. Application assessment of GRACE and CHIRPS data in the Google Earth Engine to investigate their relation with groundwater resource changes (Northwestern region of Iran). Journal of Groundwater Science and Engineering, 9(2): 102−113. DOI: 10.19637/j.cnki.2305-7068.2021.02.002.
[2]
Ali S, Liu D, Fu Q, et al. 2021. Improving the resolution of GRACE data for spatio-temporal groundwater storage assessment. Remote Sensing, 13(17): 3513. DOI: 10.3390/rs13173513.
[3]
Barros RC, Basgalupp MP, de Carvalho ACPLF, et al. 2012. A survey of evolutionary algorithms for decision-tree induction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(3): 291−312. DOI: 10.1109/TSMCC.2011.2157494.
[4]
Bhavsar H, Panchal MH. 2012. A review on support vector machine for data classification. International Journal of Advanced Research in Computer Engineering and Technology (IJARCET), 1(10): 185−189.
[5]
Charbuty B, Abdulazeez A. 2021. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(1): 20−28. DOI: 10.38094/jastt20165.
[6]
Chen JL. 2019. Satellite gravimetry and mass transport in the earth system. Geodesy and Geodynamics, 10(5): 402−415. DOI: 10.1016/j.geog.2018.07.001.
[7]
Chen JL, Cazenave A, Dahle C, et al. 2022. Applications and challenges of GRACE and GRACE follow-on satellite gravimetry. Surveys in Geophysics, 43(1): 305−345. DOI: 10.1007/s10712-021-09685-x.
[8]
Chen JL, Famigliett JS, Scanlon BR, et al. 2016. Groundwater storage changes: Present status from GRACE observations. Surveys in Geophysics, 37(2): 397−417. DOI: 10.1007/s10712-015-9332-4.
[9]
Coelho VHR, Bertrand GF, Montenegro SMGL, et al. 2018. Piezometric level and electrical conductivity spatiotemporal monitoring as an instrument to design further managed aquifer recharge strategies in a complex estuarial system under anthropogenic pressure. Journal of Environmental Management, 209: 426−439. DOI: 10.1016/j.jenvman.2017.12.078.
[10]
Dong HW, Yang LM, Wang X. 2021. Robust semi-supervised support vector machines with Laplace kernel-induced correntropy loss functions. Applied Intelligence, 51(2): 819−833. DOI: 10.1007/s10489-020-01865-3.
[11]
Eftekhari M, Madadi K, Akbari M. 2019. Monitoring the fluctuations of the Birjand Plain aquifer using the GRACE satellite images and the GIS spatial analyses. Watershed Management Research Journal, 32(4): 51−65. (In Persian). DOI: 10.22092/wmej.2019.126204.1218.
[12]
Fawagreh K, Gaber MM, Elyan E. 2014. Random forests: From early developments to recent advancements. Systems Science and Control Engineering, 2(1): 602−609. DOI: 10.1080/21642583.2014.956265.
[13]
Feng W, Shum C, Zhong M, et al. 2018. Groundwater storage changes in China from satellite gravity: An overview. Remote Sensing, 10(5): 674. DOI: 10.3390/rs10050674.
[14]
Font-Capo J, Pujades E, Vàzquez-Suñé E, et al. 2015. Assessment of the barrier effect caused by underground constructions on porous aquifers with low hydraulic gradient: A case study of the metro construction in Barcelona, Spain. Engineering Geology, 196: 238−250. DOI: 10.1016/j.enggeo.2015.07.006.
[15]
Frappart F, Ramillien G. 2018. Monitoring groundwater storage changes using the Gravity Recovery and Climate Experiment (GRACE) satellite mission: A review. Remote Sensing, 10(6): 829. DOI: 10.3390/rs10060829.
[16]
Genuer R, Poggi JM. 2020. Random forests. Cham: Springer International Publishing: 33−55. DOI: 10.1007/978-3-030-56485-8_3.
[17]
Gilbert J, Boateng C, Aryee J, et al. 2023. A systematic review of machine learning algorithms in groundwater level simulations and forecasting. Preprint.
[18]
Gleeson T, Cuthbert M, Ferguson G, et al. 2020. Global groundwater sustainability, resources, and systems in the anthropocene. Annual Review of Earth and Planetary Sciences, 48: 431−463. DOI: 10.1146/annurev-earth-071719-055251.
[19]
Gong CC, Cook PG, Therrien R, et al. 2023. On groundwater recharge in variably saturated subsurface flow models. Water Resources Research, 59(9): e2023wr034920. DOI: 10.1029/2023wr034920.
[20]
Gong CC, Zhang ZY, Wang WK, et al. 2021. An assessment of different methods to determine specific yield for estimating groundwater recharge using lysimeters. Science of the Total Environment, 788: 147799. DOI: 10.1016/j.scitotenv.2021.147799.
[21]
Haileslassie T, Gebremedhin K. 2015. Hazards of heavy metal contamination in ground water. International Journal of Technology Enhancements and Emerging Engineering Research, 3(2), 1−6.
[22]
Hilario M, Kalousis A, Pellegrini C, et al. 2006. Processing and classification of protein mass spectra. Mass Spectrometry Reviews, 25(3): 409−449. DOI: 10.1002/mas.20072.
[23]
Honarbakhsh A, Azma A, Nikseresht F, et al. 2019. Hydro-chemical assessment and GIS-mapping of groundwater quality parameters in semi-arid regions. Journal of Water Supply: Research and Technology-Aqua, 68(7): 509−522. DOI: 10.2166/aqua.2019.009.
[24]
Humphrey V, Rodell M, Eicker A. 2023. Using satellite-based terrestrial water storage data: A review. Surveys in Geophysics, 44(5): 1489−1517. DOI: 10.1007/s10712-022-09754-9.
[25]
Joachims T. 2012. Learning to classify text using support vector machines (Vol. 668). Springer Science and Business Media. DOI: 10.1007/978-1-4615-0907-3.
[26]
Kalbus E, Reinstorf F, Schirmer M. 2006. Measuring methods for groundwater–surface water interactions: Areview. Hydrology and Earth System Sciences, 10(6): 873−887. DOI: 10.5194/hess-10-873-2006.
[27]
Khanlari G, Heidari M, Momeni AA, et al. 2012. The effect of groundwater overexploitation on land subsidence and sinkhole occurrences, western Iran. Quarterly Journal of Engineering Geology and Hydrogeology, 45(4): 447−456. DOI: 10.1144/qjegh2010-069.
[28]
King Z, Farrington J, Utley M, et al. 2022. Machine learning for real-time aggregated prediction of hospital admission for emergency patients. NPJ Digital Medicine, 5(1): 104. DOI: 10.1038/s41746-022-00649-y.
[29]
Kumar D, Bhattacharjya RK. 2021. GRNN Model for prediction of groundwater fluctuation in the state of Uttarakhand of India using GRACE data under limited bore well data. Journal of Hydroinformatics, 23(3): 567−588. DOI: 10.2166/hydro.2021.108.
[30]
Li FP, Wang ZT, Chao NF, et al. 2018. Assessing the influence of the Three Gorges Dam on hydrological drought using GRACE data. Water, 10(5): 669. DOI: 10.3390/w10050669.
[31]
Li PY, Wu JH, Zhou WF, et al. 2023. Groundwater contamination and induced risk and hazard in a Karst aquifer. Environmental Earth Sciences. Cham: Springer International Publishing: 179−256. DOI: 10.1007/978-3-031-48427-8_7.
[32]
Liu Q, Gui DW, Zhang L, et al. 2022. Simulation of regional groundwater levels in arid regions using interpretable machine learning models. Science of the Total Environment, 831: 154902. DOI: 10.1016/j.scitotenv.2022.154902.
[33]
Liu W, Yu HJ, Yang LS, et al. 2021. Deep learning-based predictive framework for groundwater level forecast in arid irrigated areas. Water, 13(18): 2558. DOI: 10.3390/w13182558.
[34]
Longuevergne L, Scanlon BR, Wilson CR. 2010. GRACE hydrological estimates for small basins: Evaluating processing approaches on the high Plains aquifer, USA. Water Resources Research, 46(11): e2009wr008564. DOI: 10.1029/2009wr008564.
[35]
Louppe G. 2014. Understanding random forests: From theory to practice. Ph D. thesis. University of Liège: 1407.
[36]
Maimon OZ, Rokach L. 2014. Data mining with decision trees: Theory and applications: 81. World scientific.
[37]
Matin SS, Farahzadi L, Makaremi S, et al. 2018. Variable selection and prediction of uniaxial compressive strength and modulus of elasticity by random forest. Applied Soft Computing, 70: 980−987. DOI: 10.1016/j.asoc.2017.06.030.
[38]
Meyer U, Sosnica K, Arnold D, et al. 2019. SLR, GRACE and swarm gravity field determination and combination. Remote Sensing, 11(8): 956. DOI: 10.3390/rs11080956.
[39]
Moore S, Fisher JB. 2012. Challenges and opportunities in GRACE-based groundwater storage assessment and management: An example from Yemen. Water Resources Management, 26(6): 1425−1453. DOI: 10.1007/s11269-011-9966-z.
[40]
Patel HH, Prajapati P. 2018. Study and analysis of decision tree based classification algorithms. International Journal of Computer Sciences and Engineering, 6(10): 74−78. DOI: 10.26438/ijcse/v6i10.7478.
[41]
Rai K, Devi MS, Guleria A. 2016. Decision tree based algorithm for intrusion detection. International Journal of Advanced Networking and Applications, 7(4), 2828.
[42]
Rajaee G, Hajizadeh F, Salman MA, et al. 2011. An analysis of physical-chemical properties and quality of underground agricultural and drinking water in Southern Khorasan Province. Environmental Researches, 3(5), 13−24. (In Persian) https://dorl.net/dor/20.1001.1.20089597.1391.3.5.3.4
Rivera-Lopez R, Canul-Reich J, Mezura-Montes E, et al. 2022. Induction of decision trees as classification models through metaheuristics. Swarm and Evolutionary Computation, 69: 101006. DOI: 10.1016/j.swevo.2021.101006.
[45]
Roy DK, Munmun TH, Paul CR, et al. 2023. Improving forecasting accuracy of multi-scale groundwater level fluctuations using a heterogeneous ensemble of machine learning algorithms. Water, 15(20): 3624. DOI: 10.3390/w15203624.
[46]
Sahour H, Sultan M, Abdellatif B, et al. 2022. Identification of shallow groundwater in arid lands using multi-sensor remote sensing data and machine learning algorithms. Journal of Hydrology, 614: 128509. DOI: 10.1016/j.jhydrol.2022.128509.
[47]
Sansone M, Fusco R, Pepino A, et al. 2013. Electrocardiogram pattern recognition and analysis based on artificial neural networks and support vector machines: A review. Journal of Healthcare Engineering, 4(4): 465−504. DOI: 10.1260/2040-2295.4.4.465.
[48]
Saputra DCE, Ma'arif A, Sunat K. 2024. Optimizing predictive performance: Hyperparameter tuning in stacked multi-kernel support vector machine random forest models for diabetes identification. Journal of Robotics and Control (JRC), 4(6): 896−904. DOI: 10.18196/jrc.v4i6.20898.
[49]
Schelter LN. 2021. On groundwater monitoring using machine learning and satellite remote sensing (Doctoral dissertation). Ph.D thesis. Rheinisch-Westfälische Technische Hochschule Aachen.
[50]
Seidu J, Ewusi A, Kuma JSY, et al. 2023. Impact of data partitioning in groundwater level prediction using artificial neural network for multiple wells. International Journal of River Basin Management, 21(4): 639−650. DOI: 10.1080/15715124.2022.2079653.
[51]
Seni G, Elder JF. 2010. Ensemble Methods in Data Mining: Improving accuracy through combining predictions. ChamSpringer International Publishing, DOI: 10.1007/978-3-031-01899-2.
[52]
Seo JY, Lee SI. 2021. Predicting changes in spatiotemporal groundwater storage through the integration of multi-satellite data and deep learning models. IEEE Access, 9: 157571−157583. DOI: 10.1109/ACCESS.2021.3130306.
[53]
Shao Y, Lunetta RS. 2012. Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS Journal of Photogrammetry and Remote Sensing, 70: 78−87. DOI: 10.1016/j.isprsjprs.2012.04.001.
[54]
Shouval R, Bondi O, Mishan H, et al. 2014. Application of machine learning algorithms for clinical predictive modeling: A data-mining approach in SCT. Bone Marrow Transplantation, 49(3): 332−337. DOI: 10.1038/bmt.2013.146.
[55]
Singha S, Pasupuleti S, Singha SS, et al. 2021. Prediction of groundwater quality using efficient machine learning technique. Chemosphere, 276: 130265. DOI: 10.1016/j.chemosphere.2021.130265.
[56]
Springer A, Eicker A, Bettge A, et al. 2017. Evaluation of the water cycle in the European COSMO-REA6 reanalysis using GRACE. Water, 9(4): 289. DOI: 10.3390/w9040289.
[57]
Sun AY. 2013. Predicting groundwater level changes using GRACE data. Water Resources Research, 49(9): 5900−5912. DOI: 10.1002/wrcr.20421.
[58]
Swenson S, Wahr J. 2002. Methods for inferring regional surface-mass anomalies from Gravity Recovery and Climate Experiment (GRACE) measurements of time-variable gravity. Journal of Geophysical Research: Solid Earth, 107(B9). DOI: 10.1029/2001jb000576.
[59]
Wang J, Lu SY, Wang SH, et al. 2022. A review on extreme learning machine. Multimedia Tools and Applications, 81(29): 41611−41660. DOI: 10.1007/s11042-021-11007-7.
[60]
Wang YH, Gupta HV. 2024. A mass-conserving-perceptron for machine-learning-based modeling of geoscientific systems. Water Resources Research, 60(4): e2023wr036461. DOI: 10.1029/2023wr036461.
[61]
Werth S, Güntner A, Schmidt R, et al. 2009. Evaluation of GRACE filter tools from a hydrological perspective. Geophysical Journal International, 179(3): 1499−1515. DOI: 10.1111/j.1365-246X.2009.04355.x.
[62]
Wilhite DA, Glantz MH. 1985. Understanding: The drought phenomenon: The role of definitions. Water International, 10(3): 111−120. DOI: 10.1080/02508068508686328.
[63]
Wouters B, Bonin JA, Chambers DP, et al. 2014. GRACE, time-varying gravity, Earth system dynamics and climate change. Reports on Progress in Physics. Physical Society (Great Britain), 77(11): 116801. DOI: 10.1088/0034-4885/77/11/116801.
[64]
Yaman A, Cengiz MA. 2021. The effects of kernel functions and optimal hyperparameter selection on support vector machines. Journal of New Theory, (34): 64−71.
[65]
Yang YT, Long D, Guan HD, et al. 2014. GRACE satellite observed hydrological controls on interannual and seasonal variability in surface greenness over mainland Australia. Journal of Geophysical Research: Biogeosciences, 119(12): 2245−2260. DOI: 10.1002/2014jg002670.
Zhang XM, Wang N, Cao LS, et al. 2024. Analysis of the contribution of rainfall to recharge in the Mu Us Desert (China) based on lysimeter data. Hydrogeology Journal, 32(1): 279−288. DOI: 10.1007/s10040-023-02750-2.
[68]
Zhu FB. 2018. A classification algorithm of CART decision tree based on MapReduce attribute weights. International Journal of Performability Engineering, 14(1): 17. DOI: 10.23940/ijpe.18.01.p3.1725.
[69]
Ziegler A, König IR. 2014. Mining data with random forests: Current options for real-world applications. WIREs Data Mining and Knowledge Discovery, 4(1): 55−63. DOI: 10.1002/widm.1114.
RIGHTS & PERMISSIONS
Journal of Groundwater Science and Engineering Editorial Office