A state-of-the-art Fuzzy Nonlinear Additive Regression (FNAR) model for groundwater level prediction

Sepideh Zeraati Neyshabouri; Abbas Khashei-Siuki; Mohammad Ghasem Akbari

doi:10.26599/JGSE.2026.9280074

J. Groundw. Sci. Eng. ›› 2026, Vol. 14 ›› Issue (1) :83 -99. DOI: 10.26599/JGSE.2026.9280074

Research Article

research-article

A state-of-the-art Fuzzy Nonlinear Additive Regression (FNAR) model for groundwater level prediction

Author information +

History +

PDF (1652KB)

Abstract

Groundwater modeling remains challenging due to heterogeneity and complexity of aquifer systems, necessitating endeavors to quantify Groundwater Levels (GWL) dynamics to inform policymakers and hydrogeologists. This study introduces a novel Fuzzy Nonlinear Additive Regression (FNAR) model to predict monthly GWL in an unconfined aquifer in eastern Iran, using a 19-year (1998–2017) dataset from 11 piezometric wells. Under three distinct scenarios with progressively increasing input complexity, the study utilized readily available climate data, including Precipitation (Prc), Temperature (Tave), Relative Humidity (RH), and Evapotranspiration (ETo). The dataset was split into training (70%) and validation (30%) subsets. Results showed that among three input scenarios, Scenario 3 (Sc3, incorporating all four variables) achieved the best predictive performance, with RMSE ranging from 0.305 m to 0.768 m, MAE from 0.203 m to 0.522 m, NSE from 0.661 to 0.980, and PBIAS from 0.771% to 0.981%, indicating low bias and high reliability. However, Sc2 (excluding ETo) with RMSE ranging from 0.4226 m to 0.9909 m, MAE from 0.3418 m to 0.8173 m, NSE from 0.2831 to 0.9674, and PBIAS from −0.598% to 0.968% across different months offers practical advantages in data-scarce settings. The FNAR model outperforms conventional Fuzzy Least Squares Regression (FLSR) and holds promise for GWL forecasting in data-scarce regions where physical or numerical models are impractical. Future research should focus on integrating FNAR with deep learning algorithms and real-time data assimilation expanding applications across diverse hydrogeological settings.

Graphical abstract

Keywords

Birjand aquifer / Data-scarce regions / Fuzzy-based approach / Groundwater table / Novel statistical model / Soft computing

Cite this article

Download citation ▾

Sepideh Zeraati Neyshabouri, Abbas Khashei-Siuki, Mohammad Ghasem Akbari. A state-of-the-art Fuzzy Nonlinear Additive Regression (FNAR) model for groundwater level prediction. J. Groundw. Sci. Eng., 2026, 14(1): 83-99 DOI:10.26599/JGSE.2026.9280074

登录浏览全文

4963

注册一个新账户忘记密码

Introduction

Groundwater plays a crucial role in sustaining agriculture, economic development, and safe water supply (Kuang et al. 2024; Zaresefat and Derakhshani, 2023). Its dynamic nature allows adaption to both short-term and long-term fluctuations in climate, extraction rates, and land use patterns (Rahimi and Ebrahimi, 2023). Yet, rapid urbanization and anthropogenic activities disrupt natural recharge processes, threatening sustainability (Kuang et al. 2024). Population growth, poor irrigation, industrialization, deforestation, and climate change continue to degrade groundwater quantity and quality, particularly in arid and semi-arid regions (Condon et al. 2021; Hoogesteger et al. 2022; Zaresefat and Derakhshani, 2023).

Groundwater level (GWL) reflects water availability and accessibility (Tao et al. 2022). Understanding its temporal variationcan can guide sustainable water management and policy decisions (Lall et al. 2020; Tao et al. 2022). However, GWL is driven by the interplay of climatic, topographic, and hydrogeological factors, rendering its simulation a challenging task (Rahimi and Ebrahimi, 2023; Zaresefat and Derakhshani, 2023). Although data-driven models effectively predict GWL, their heavy requirements and associated costs hinder their use in poorly monitored regions (Sun et al. 2022; Malakar et al. 2022). Consequently, models that require less inputs yet retaining accuracy have gained attention. Hybrid fuzzy systems, integrating fuzzy logic with algorithmic methods, have proven efficient in various disciplines, including groundwater management (Ramezani, 2022).

Fuzzy logic, a soft computing tool for managing uncertainty, vagueness, and partial truth (Kambalimath and Deka, 2020), was first applied in hydrology by Bogardi et al. (1983) and has since been widely adopted in various hydrologic areas, including GWL prediction (Theodoridou et al. 2017; Ang et al. 2023; Kambalimath and Deka, 2020). Hybrid fuzzy models have emerged as powerful tools for simulating and forecasting GWL, offering accurate and interpretable results with minimal data requirements that make them suitable for groundwater management in data-scarce regions (Saikrishnamacharyulu et al. 2022; Samani et al. 2022; Varouchakis et al. 2019). Successful applications have been reported in various geographical regions. For example, Ahmadifar et al. (2025) introduced a hybrid vine copula-fuzzy model that simulated monthly GWL in the Najafabad aquifer, Iran, achieving R² values of 0.83 and 0.88 using C-vine dependencies and fuzzy α-cuts for confidence-range outputs tailored to water management. Bhadani et al. (2024) integrated Fuzzy C-Means clustering with a Sugeno-type Fuzzy Inference System (FIS) and Invasive Weed Optimization (IWO), predicting GWL fluctuations (R = 0.89, nRMSE = 0.18, bias = 0.08) using precipitation, humidity, and GWL lag, outperforming 17 benchmarks on 30% test data. Also, a Coactive Neuro-Fuzzy Inference System (CANFIS) with a bagging ensemble forecasted multi-lead GWL in Texas/Florida wells using inputs (GWL, precipitation, temperature, surface water) and achieved superior monthly forecast accuracy, highlighting the importance of careful predictor selection (Boo et al. 2024). A fuzzy-AHP framework predicted GWL fluctuations in unconfined aquifers using physical models and 18 weighted factors, yielding R² = 0.4035 overall and R² = 0.9422 for peak changes in Sarpol-e Zahab, Iran (Rashidi Gooya et al. 2024). Moreover, Samantaray et al. (2022) explored the Cascade Feedforward Backpropagation Neural Network (CFBPNN), Adaptive Neuro-fuzzy inference System (ANFIS), and hybrid ANFIS-GWO models for GWL prediction within two Indian Balangir watersheds using eighteen-year dataset of precipitation, mean discharge, evapotranspiration, and extreme temperatures. The ANFIS-GWO outperformed the others (NSE = 0.9745, WI = 0.9763, RMSE = 0.0450), with evapotranspiration identified as a key input enhancing predictive accuracy. NavaleV and Mhaske (2023) successfully applied Adaptive Neuro-fuzzy Inference Systems (ANFIS) in forecasting GWL using hydrogeological and meteorological factors and emphasized that optimal input selection and model configuration are key to improving predictive performance.

Despite these advances, critical research gaps persist. Most existing fuzzy hybrid models, including ANFIS-GWO (Samantaray et al. 2022), FIS-IWO (Bhadani et al. 2024), and CANFIS (Boo et al. 2024), require antecedent GWL measurements as inputs, limiting their applicability in ungauged or poorly monitored aquifers. In addition, complex hybrid architectures demand extensive hyperparameter tuning and auxiliary data (e.g., discharge, soil properties) that are rarely available in developing arid regions. Importantly, few studies evaluate the trade-off between prediction accuracy and data acquisition feasibility, a key element for real-world deployment. Therefore, this study introduces the novel Fuzzy Nonlinear Additive Regression (FNAR) model for GWL prediction with three key contributions: (1) to present the first application of FNAR in hydrological modeling; (2) to eliminate dependency on historical GWL data by relying exclusively on widely available meteorological variables (precipitation, temperature, relative humidity, and evapotranspiration); and (3) to identify the most practical input configuration (i.e., balancing predictive performance with minimal data requirements) thereby enhancing model applicability in data-scarce arid regions. A key benefit of fuzzy hybrid methods, including the FNAR framework, is their ability to address nonlinear and complex problems, and to handle uncertainties arising from insufficient data through fuzzy membership functions, provide robust models with interpretable predictions, and operate effectively with limited datasets. It is hypothesized that the FNAR model is an appropriate choice for GWL modeling characterized by non-linearity, complexity, and spatial and temporal variability. This innovative approach is expected to assist planners, regulators, lawmakers, and municipal governments in effectively managing groundwater resources.

1 Study area

The study was conducted in the Birjand Plain aquifer in South Khorasan Province, eastern Iran (32°30'–33°00'N, 58°45'–59°41'E) (Fig. 1). This region represents groundwater systems in arid regions of Central and West Asia facing severe water stress, with declining water tables exceeding 0.5 m/a threatening agricultural sustainability and urban water security. The aquifer supports approximately 200,000 inhabitants and extensive irrigated agriculture covering more than 40,000 hectares, making sustainable management critically important for regional food security and livelihoods. The situation of this study area is broadly applicable to similar arid groundwater basins across Iran, where more than 400 plains are experiencing depletion, as well as throughout Central Asia and the Middle East region (Aryafar et al. 2020).

The basin encompasses 3,155 km² (1,045 km² alluvial plain, 2,110 km² mountains), with elevations from 2,787 m to 1,240 m. The climate is arid, with mean annual precipitation of 177 mm (CV=45%, declining by 1.2 mm/a) and mean temperature of 16.5°C. Precipitation is highly seasonal: 81% in winter-spring, 18% in autumn, <1% in summer, with a March maximum (35 mm) and an August minimum (<1 mm). Temperatures range from 40°C (summer) to −15°C (winter), with 15–20°C diurnal variations.

Hydrogeologically, the eastern section has thin alluvium (~50 m) with limited potential; the central plain is the most productive with ~150 m alluvial thickness, hydraulic conductivity of 50–100 m/d, and saturated thickness of 80–120 m; the western section has finer sediments with reduced transmissivity. The unconfined aquifer has water table depths of 20–80 m. Natural recharge (80–100 MCm/a) from mountain precipitation is substantially exceeded by withdrawal (150–180 MCm/a), causing ~0.6 m/a groundwater decline.

2 Materials and methods

2.1 Data setsand quality control

The study utilizes a 19-year (March 1998 to March 2017) monthly dataset comprising GWL measurements from 11 piezometric wells across the alluvial plain of the Birjand aquifer. Fig. 2. displays the distribution of meteorological variables including Tave, RH, Prc, and ETo, across the study period. Well locations were selected to ensure representative spatial coverage, data continuity, and quality. Meteorological inputs, including average air temperature (Tave, °C), total precipitation (Prc, mm), relative humidity (RH, %), and reference evapotranspiration (ETo, mm/d), were obtained from standard weather stations operated by the Iran Meteorological Organization, while GWL data (m above sea level) were sourced from the South Khorasan Regional Water Authority.

The 1998–2017 dataset was chosen to capture key hydrological variability, including multi-year cycles and droughts, using consistent measurements from 11 stations. Hydrological and management conditions were stable, enabling assessment of aquifer behavior without confounding policy effects. Post-2017 interventions were excluded to avoid non-stationarity, as aquifer responses manifest over several years. This period optimally balances temporal coverage with data consistency for validated predictive modeling.

The overall missing data rate was less than 3%. Gaps of ≤ 2 months were filled using linear interpolation; longer gaps were imputed using the corresponding monthly mean from adjacent years. Outliers (<1.5% of observations), defined as values exceeding ±3 standard deviations from the monthly mean, were replaced with the monthly median. To address temporal inconsistencies due to non-uniform observation dates across stations, all records were aggregated to monthly averages, ensuring temporal alignment across the dataset. The fuzzy regression approach employed in this study further mitigates data uncertainties by representing GWL as triangular fuzzy numbers, thereby enhancing model robustness against measurement errors and natural variability. Following preprocessing, the dataset was chronologically split into training (70%, March 1998–March 2011) and validation (30%, April 2011–March 2017) subsets to preserve temporal independence and enable realistic performance evaluation.

2.2 Model development for GWL prediction

The three input scenarios were designed to evaluate the trade-off between model accuracy and data accessibility, progressing from minimal to comprehensive meteorological inputs (Table 2). (1) Scenario 1 (Sc1) employs only precipitation (Prc) and temperature (Tave) as the minimal input set, representing the most universally available meteorological variables. This configuration tests whether fundamental water balance drivers [i.e., recharge (via Prc) and evapotranspiration (via Tave)] are sufficient for acceptable GWL prediction in data-scarce regions; (2) Scenario 2 (Sc2) adds relative humidity (RH) to the Sc1 set to account for atmospheric moisture effects on actual evapotranspiration. RH is routinely measured at many basic weather stations and improves the physical representation of moisture availability without requiring specialized instrumentation, and (3) Scenario 3 (Sc3) incorporates reference evapotranspiration (ETo) to directly represent potential water loss. This design enables assessment of marginal performance gains per additional variable, with the operational objective of determining the minimum data requirements for achieving accurate GWL prediction via the FNAR model. Critically, lagged GWL was excluded from all scenarios to ensure applicability in ungauged or poorly monitored aquifers, a key innovation distinguishing FNAR from most existing models that rely on historical groundwater observations.

A detailed review of the basic concepts of fuzzy numbers and the fuzzy nonlinear incremental regression (FNAR) model will be presented in the next section. In addition, to facilitate the performance review and evaluation of the FNAR model, the fuzzy least squares regression (FLSR) method is used with a brief explanation below.

2.3 Model development

2.3.1 Basic concepts on fuzzy numbers

A fuzzy set

$ \tilde{A} $

$ \rm{R} $

(the real line) is defined by its membership function

$ {\mu }_{\tilde{A}}\colon \rm{R}\longrightarrow \left[0, 1\right]. $

It is classified as a fuzzy number if it is normal; that is, there exists a unique

$ {{{x}^{*}}}_{\tilde{A}}\in \rm{R} $

such that

$ {\mu }_{\tilde{A}} $

(

$ {{{x}^{*}}}_{\tilde{A}} $

) = 1. For every

$ \alpha \in [0, 1] $

, the set

$ \tilde{A}\left[\alpha \right]=\{x\in \rm{R} \colon {\mu }_{\tilde{A}}(x)\geqslant {\text{α}} \} $

forms a non-empty closed interval in

$ \rm{R} $

. This interval is represented as

$ \tilde{A}\left[\alpha \right]=[{\tilde{A}}_{L\alpha },{\tilde{A}}_{U\alpha }] $

, where

$ {\tilde{A}}_{L\alpha }=\mathrm{inf} \{x\colon x\in \tilde{A}\left[\alpha \right] $

} and

$ {\tilde{A}}_{U\alpha }=\mathrm{sup} \{x\colon x\in \tilde{A}\left[\alpha \right]\}. $

It should be noted that fuzzy numbers represent approximate assessments provided by experts and accepted by decision-makers when obtaining more precise values is either impossible or unnecessary. To facilitate the representation and manipulation of fuzzy numbers, several authors have expressed the information contained in a fuzzy number using a functional parametric form known as an LR-fuzzy number

$ \tilde{A}={\left(a;{{l}_{\tilde{A}}},{{r}_{\tilde{A}}}\right)}_{LR}. $

The membership function of an LR-fuzzy number

$ \tilde{A} $

is defined as follows (Equation 1):

(1)

$ {\mu }_{\tilde{A}}\left(x\right)=\begin{cases} L\left(\dfrac{a-x}{{l}_{\tilde{A}}}\right),x\leqslant a\\ R\left(\dfrac{x-a}{{r}_{\tilde{A}}}\right),x> a\\ \end{cases} $

Where

$ a\in {\rm{R}},{l}_{\tilde{A}}> 0,{r}_{\tilde{A}}> 0 $

are referred to as the center value and the left and right spreads of

$ \tilde{A} $

, respectively. The shape functions L(x) and R(x) are decreasing functions from [0, +

$ \boldsymbol{\infty } $

)

$ \rightarrow [0, 1] $

. with the following properties (Equations 2–5):

(2)

$ \mathit{L} \mathrm{(0)=1} $

(3)

$ L( x )<1( \forall x >0) $

(4)

$ L( x )>0( \forall x <1) $

(5)

$ L(1)=0 ({\mathrm{or}}\; L( x )>0\; ( \forall x ))\; L(+\infty)=0. $

The LR-number has been applied in various problems as a general model of imprecision. In special cases where L = R,

$ \tilde{A}={\left(a;{{l}_{\tilde{A}}},{{r}_{\tilde{A}}}\right)}_{LR} $

is referred to as a symmetric LR-fuzzy number, denoted by

$ \tilde{A}={\left(a;{{l}_{\tilde{A}}}\right)}_{L} $

In this paper, the most commonly used LR-fuzzy numbers (with L(

$ x $

) = R(

$ x $

) = max{0,1−

$ x $

}), known as triangular fuzzy numbers (TFNs), are employed to manage imprecision across the dataset during numerical evaluations. The membership function of a triangular fuzzy number, denoted by

$ \tilde{A}={\left(a;{{l}_{\tilde{A}}},{{r}_{\tilde{A}}}\right)}_{T} $

, is given by (Equation 6):

(6)

$ {\mu }_{\tilde{A}}\left(x\right)=\begin{cases} \dfrac{x-\left(a-{l}_{\tilde{A}}\right)}{{l}_{\tilde{A}}},\;\;a-{l}_{\tilde{A}}\leqslant x< a\\ \dfrac{a+{r}_{\tilde{A}}-x}{{r}_{\tilde{A}}},\;\;a< x< a+{r}_{\tilde{A}},\\ 0,\;\;x\in R-\left(a-{l}_{\tilde{A}},a+{r}_{\tilde{A}}\right) \end{cases} $

Common operations between two LR-fuzzy numbers

$ \tilde{A}={\left(a;{{l}_{\tilde{A}}},{{r}_{\tilde{A}}}\right)}_{LR} $

and

$ \tilde{B}={\left(b;{{l}_{\tilde{B}}},{{r}_{\tilde{B}}}\right)}_{LR} $

are defined as follows (Equations 7–8):

Addition:

(7)

$ \tilde{A} \oplus \tilde{B} = {\left(a+b;{{l}_{\tilde{A}}}+{{l}_{\tilde{B}}},{{r}_{\tilde{A}}}+{{r}_{\tilde{A}}}\right)}_{LR} . $

Scalar Multiplication:

(8)

$ \begin{split} & if\; \lambda > 0\colon \lambda \otimes \tilde{A}={\left(\lambda a;\lambda {{l}_{\tilde{A}}},\lambda {{r}_{\tilde{A}}}\right)}_{LR},if\;\lambda < 0 : \\&\quad \lambda \otimes \tilde{A} = {\left(\lambda a;-\lambda {{r}_{\tilde{A}}},-\lambda {{l}_{\tilde{A}}}\right)}_{RL}.\end{split} $

Generalized subtraction is also defined and analyzed as follows:

$ \tilde{A}={\left(a;{{l}_{\tilde{A}}},{{r}_{\tilde{A}}}\right)}_{LR} $

and

$ \tilde{B}={\left(b;{{l}_{\tilde{B}}},{{r}_{\tilde{B}}}\right)}_{LR} $

There are two LR-fuzzy numbers, the Hukuhara subtraction between

$ \tilde{A} $

and

$ \tilde{B} $

is defined as (Equation 9):

(9)

$ W = {\left(a-b;max\left({{l}_{\tilde{A}}}-{{l}_{\tilde{B}}}, 0\right),max\left({{r}_{\tilde{A}}}-{{r}_{\tilde{B}}}, 0\right)\right)}_{LR}. $

Notably, this subtraction method is superior to the common subtraction operator because it yields an LR-fuzzy number when applied to fuzzy data. Additionally, the square error distance between two LR-fuzzy numbers

$ \tilde{A}={\left(a;{{l}_{\tilde{A}}},{{r}_{\tilde{A}}}\right)}_{LR} $

and

$ \tilde{B}= {\left(b;{{l}_{\tilde{B}}},{{r}_{\tilde{B}}}\right)}_{LR}, $

as utilized in this paper, is defined as follows (Equation10):

(10)

$ {D}^{2}\left(\tilde{A},\tilde{B}\right)=\sqrt{\frac{{\left| a-b\right| }^{2}+{c}_{1}{\left| {l}_{\tilde{A}}-{l}_{\tilde{B}}\right| }^{2}+{c}_{2}{\left| {r}_{\tilde{A}}-{r}_{\tilde{B}}\right| }^{2}}{3}} $

Where

$ {c}_{1}=\displaystyle\int \nolimits_{0}^{1}{L}^{-1}(\alpha )d\alpha $

and

$ {c}_{2}=\displaystyle\int \nolimits_{0}^{1}{R}^{-1}\left(\alpha \right)d\alpha . $

For any three triangular fuzzy numbers A, B, and C, the following conditions are satisfied:

Condition 1:

$ {D}^{2}\left(\tilde{A},\tilde{B}\right)=0\;\mathrm{if}\; \mathrm{and}\; \mathrm{only}\; \text{if}\;A=B, $

Condition 2:

$ {D}^{2}\left(\tilde{A},\tilde{B}\right)={D}^{2}\left(\tilde{B},\tilde{A}\right), $

Condition 3:

$ {D}^{2}\left(\tilde{A},\tilde{C}\right)\leqslant {D}^{2}\left(\tilde{A},\tilde{B}\right)+{D}^{2}\left(\tilde{B},\tilde{C}\right). $

This framework provides valuable insights for addressing imprecision in data analysis through the use of fuzzy numbers.

2.3.2 Fuzzy Least Squares Regression Model (FLSR)

Fuzzy Least Squares Regression (FLSR) is understood as a development of conventional least squares regression, with specific adaptation for addressing data uncertainty and imprecision. In contrast to traditional regression techniques, which rely on precise input values, fuzzy numbers are utilized by FLSR to represent data points, allowing for a more flexible modeling approach. Scenarios in which data exhibits variability or lacks precision are particularly suited to this method.

Within FLSR, the distance between fuzzy numbers is defined, and the minimization of the sum of squared deviations is pursued. This objective is in accordance with the aims of standard least squares methods, while the inherent uncertainty within the data is accommodated.

The general equation for a fuzzy linear regression model can be expressed as follows (Equation 11):

(11)

$ {\tilde{y}}_{i}=\sum \limits_{j=1}^{n}\tilde{{\beta }_{j}}\otimes {x}_{ij}+{\tilde{\epsilon }}_{i} $

Where:

$ {\tilde{y}}_{i} $

is the fuzzy output (dependent variable).

$ {x}_{ij} $

represents the inputs (independent variables).

$ \tilde{{\beta }_{j}} $

represents the fuzzy coefficients that are to be estimated.

$ \otimes $

indicates the fuzzy multiplication operator.

$ {\tilde{\epsilon }}_{i} $

is identified as the fuzzy error term.

The minimization of the sum of squared deviations between the observed fuzzy outputs and the estimated fuzzy outputs is the primary goal of FLSR. This gives rise to an optimization problem expressed as (Equation 12):

(12)

$ J={{\sum \limits_{i=1}^{m}}\left\|{\tilde{y}}_{i}-\sum \limits_{j=1}^{n}\tilde{{\beta }_{j}}\otimes {x}_{ij}\right\|^{2} }$

Effective modeling of relationships within datasets where traditional regression methods may encounter difficulties due to data uncertainty is facilitated by this formulation. The application of FLSR has been explored in a range of fields, and its ability to provide dependable predictions, even when limited or imprecise datasets are present, has been demonstrated. Overall, the robustness of regression analysis is improved by FLSR through the incorporation of fuzzy logic principles, which leads to enhanced prediction reliability in uncertain environments.

In the current study, the FLSR model has been implemented to evaluate and assess the accuracy of the proposed model (FNAR).

2.3.3 Fuzzy Nonlinear Additive Regression (FNAR) model

In a fuzzy additive regression model that includes both fuzzy predictors and fuzzy responses, it is assumed that the observed data for n statistical units is represented as (

$ {({{\tilde{y}}_{i}},{{\tilde{x}}_{i}}=\left({{\tilde{x}}_{i1}},{{\tilde{x}}_{i2}},\cdots ,{{\tilde{x}}_{ip}}\right)}^{T} $

). Based on this dataset, the following fuzzy additive regression model can be formulated (Equation13):

(13)

$ {\tilde{y}}_{i} = {\tilde{f}}_{1}\left({\tilde{x}}_{i1}\right)\oplus {\tilde{f}}_{2}\left({\tilde{x}}_{i2}\right)\oplus \cdots \oplus {\tilde{f}}_{p-1}\left({\tilde{x}}_{i\left(p-1\right)}\right)\oplus {\tilde{f}}_{p}({\tilde{x}}_{ip})\oplus {\tilde{\epsilon }}_{i} $

which can also be written as (Equation14):

(14)

$ {\tilde{y}}_{i} = \oplus _{l=1}^{p}{\tilde{f}}_{l}\left({\tilde{x}}_{il}\right)\oplus {\tilde{\epsilon }}_{i}, $

Where:

$ {\tilde{y}}_{i}={\left({{y}_{i}};{{l}_{{{\tilde{y}}_{i}}}},{{r}_{{{\tilde{y}}_{i}}}}\right)}_{LR} $

, denotes the fuzzy responses,

$ {\tilde{x}}_{ij} $

represents the non-fuzzy predictors for fuzzy linear regression,

$ {\tilde{f}}_{l}\left({\tilde{x}}_{il}\right)={\left({{\tilde{f}}_{l}}\left({\tilde{x}}_{il}\right);{{\tilde{f}}_{l}}\left({\tilde{x}}_{il}\right),{{\tilde{f}}_{r}}\left({\tilde{x}}_{il}\right)\right)}_{LR} $

indicates the fuzzy smooth functions to be estimated, and

$ {\tilde{\epsilon }}_{i} $

signifies a fuzzy error term.

To estimate the fuzzy smooth functions.

$ {\tilde{f}}_{1}, {\tilde{f}}_{2},\cdots , {\tilde{f}}_{p} $

at a specified point x =

$ {\left({{\tilde{x}}_{1}},{{\tilde{x}}_{2}},\cdots ,{{\tilde{x}}_{p}}\right)}^{T}, $

a p-step back fitting additive method is proposed as follows:

Step 1

Initially, the following univariate fuzzy nonlinear regression model is considered (Equation15):

(15)

$ {\tilde{y}}_{i} = {\tilde{f}}_{1}\left({\tilde{x}}_{i1}\right)\oplus {\tilde{\epsilon }}_{i1}. $

By extending a well-known nonparametric estimator in the fuzzy domain,

$ {\hat{{\tilde{f_l}}}}^{K}\left({\tilde{x}}_{1}\right) $

can be determined as follows (Equation 16):

(16)

$ {\hat{{\tilde{f_1}}}}^{K}\left({\tilde{x}}_{i1}\right)=\oplus _{j=1}^{n}({w}_{1j}\left({\tilde{x}}_{i1};{h}_{1}\right)\otimes {\tilde{y}}_{j}, $

Where:

$ {w}_{1j}\left({\tilde{x}}_{i1};{h}_{1}\right) $

= K

$ \left(\frac{D\left({\tilde{x}}_{i1},{\tilde{x}}_{j1}\right)}{{h}_{1}} \right)$

$ \sum \nolimits_{j=1}^{n}K\left(\frac{D\left({\tilde{x}}_{i1},{\tilde{x}}_{j1}\right)}{{h}_{1}}\right). $

Here, K(·) denotes a kernel function and

$ {h}_{1} $

represents an unknown bandwidth parameter.

Step 2

The model from Step 1 is extended to include a one-step-ahead fuzzy nonlinear function.

$ {\tilde{f}}_{2} $

(Equation 17):

(17)

$ {\tilde{y}}_{i} = {\tilde{f}}_{1}\left({\tilde{x}}_{i1}\right)\oplus {\tilde{f}}_{2}\left({\tilde{x}}_{i2}\right)\oplus {\tilde{\epsilon }}_{2i}. $

Substituting

$ {\hat{{\tilde{f_l}}}}^{K}\left({\tilde{x}}_{i1}\right) $

into this equation yields (Equation 18):

(18)

$ {\tilde{y}}_{i} = {\hat{{\tilde{f_1}}}}^{K}\left({\tilde{x}}_{i1}\right)\oplus {\tilde{f}}_{2}\left({\tilde{x}}_{i2}\right)\oplus {\tilde{\epsilon }}_{2i}. $

Applying Hukuhara subtraction on both sides results in a fuzzy nonlinear univariate regression model (Equation 19):

(19)

$ {\tilde{y}}_{i}{\ominus }_{H}{\hat{{\tilde{f_1}}}}^{K}\left({\tilde{x}}_{i1}\right)={\tilde{f}}_{2}\left({\tilde{x}}_{i2}\right)\oplus {\tilde{\epsilon }}_{2i}. $

Following a similar approach as in Step 1, the fuzzy nonparametric estimator of

$ {\tilde{f}}_{2}\left({\tilde{x}}_{i2}\right) $

can be evaluated as (Equation 20):

(20)

$ {\hat{{\tilde{f_2}}}}^{K}\left({\tilde{x}}_{i2}\right)= \oplus _{j=1}^{n}\left({w}_{2j}\left({\tilde{x}}_{i2};{h}_{2}\right)\otimes \left({\tilde{y}}_{j}{\ominus }_{H}{\hat{{\tilde{f_1}}}}^{K}\left({\tilde{x}}_{i1}\right)\right)\right), $

Where:

$ {{{w}_{j}}}^{2}\left({\tilde{x}}_{i2};{h}_{2}\right) $

= K

$ \left(\frac{D\left({\tilde{x}}_{i2},{\tilde{x}}_{j2}\right)}{{h}_{2}}\right) $

$ \sum \nolimits_{j=1}^{n}K\left(\frac{D\left({\tilde{x}}_{i2},{\tilde{x}}_{j2}\right)}{{h}_{2}}\right) $

in which

$ {h}_{2} $

is an unknown bandwidth parameter.

Step 3

In this step, Equation 15 is considered with a one-step-ahead fuzzy nonlinear function.

$ {\tilde{f}}_{3} $

, represented as (Equation 21):

(21)

$ {\tilde{y}}_{i} = {\tilde{f}}_{1}\left({\tilde{x}}_{i1}\right)\oplus {\tilde{f}}_{2}\left({\tilde{x}}_{i2}\right)\oplus {\tilde{f}}_{3}\left({\tilde{x}}_{i3}\right)\oplus {\tilde{\epsilon }}_{3i}. $

The values of

$ {\hat{{\tilde{f_1}}}}^{K}\left({\tilde{x}}_{i1}\right) $

and

$ {\hat{{\tilde{f_2}}}}^{K}\left({\tilde{x}}_{i2}\right) $

, derived from the evaluations in Steps 1 and 2, are substituted into the Equation22:

(22)

$ {\tilde{y}}_{i} = {\hat{{\tilde{f_1}}}}^{K}\left({\tilde{x}}_{i1}\right)\oplus {\hat{{\tilde{f_2}}}}^{K}\left({\tilde{x}}_{i2}\right)\oplus {\tilde{f}}_{3}\left({\tilde{x}}_{i3}\right)\oplus {\tilde{\epsilon }}_{3i}. $

By applying Hukuhara subtraction to both sides of this equation, a fuzzy nonlinear univariate regression model is formed (Equation 23):

(23)

$ {\tilde{y}}_{i}{\ominus }_{H}({\hat{{\tilde{f_1}}}}^{K}\left({\tilde{x}}_{i1}\right)\oplus {\hat{{\tilde{f_2}}}}^{K}\left({\tilde{x}}_{i2}\right))={\tilde{f}}_{3}\left({\tilde{x}}_{i3}\right)\oplus {\tilde{\epsilon }}_{3i}. $

Following the methodology established in Steps 1 and 2, the fuzzy nonparametric estimator

$ {\tilde{f}}_{3}\left({x}_{i3}\right) $

can be evaluated as follows (Equation 24):

(24)

$ \begin{split} & {\hat{{\tilde{f_3}}}}^{K}\left({\tilde{x}}_{3}\right)=\\& \oplus _{j=1}^{n}\left({w}_{3j}\left({\tilde{x}}_{i3};{h}_{3}\right)\otimes \left({\tilde{y}}_{j}{\ominus }_{H}\left({\hat{{\tilde{f_1}}}}^{K}\left({\tilde{x}}_{i1}\right)\oplus {\hat{{\tilde{f_2}}}}^{K}\left({\tilde{x}}_{i2}\right)\right)\right)\right),\end{split} $

In this equation, the weight function is defined as

$ {{{w}_{j}}}^{3}\left({\tilde{x}}_{i3};{h}_{3}\right) $

= K

$ \left(\frac{D\left({\tilde{x}}_{i3},{\tilde{x}}_{j3}\right)}{{h}_{3}}\right) $

$ \sum \nolimits_{j=1}^{n}K\left(\frac{D\left({\tilde{x}}_{i3},{\tilde{x}}_{j3}\right)}{{h}_{3}}\right)$

, where

$ {h}_{3} $

represents the unknown band width parameter.

Step (p-1)

Continuing with the previously established procedure, a fuzzy nonparametric estimator

$ {\tilde{f}}_{p-1}\left({\tilde{x}}_{i(p-1)}\right) $

can be obtained at the

$ {(p-1)}^{th} $

step as follows (Equation 25):

(25)

$ \begin{split} & {\hat{\tilde{f} }}_{p-1}^{K}\left({\tilde{x}}_{i(p-1)}\right)=\\& \oplus _{j=1}^{n} \left({w}_{p-1j}\left({\tilde{x}}_{i(p-1)};{h}_{(p-1)}\right) \otimes \left({\tilde{y}}_{j}{\ominus }_{H} \left(\oplus _{v=1}^{p-2}\oplus {\hat{{\tilde{f_v}}}}^{K}\left({\tilde{x}}_{iv}\right) \right) \right) \right),\end{split} $

The weight function for this step is given by

$ {w}_{p-1j} \left({\tilde{x}}_{ip-1};{h}_{(p-1}\right) $

= K

$ \left(\frac{D\left({\tilde{x}}_{i(p-1)},{\tilde{x}}_{j(p-1)}\right)}{{h}_{(p-1)}}\right) $

$ \sum \nolimits_{j=1}^{n}K\left(\frac{D\left({\tilde{x}}_{i(p-1)},{\tilde{x}}_{j(p-1)}\right)}{{h}_{(p-1)}}\right) $

where

$ {h}_{p-1} $

denotes the unknown bandwidth parameters.

Step (p)

In the final step, a fuzzy additive regression model is developed by incorporating

$ \tilde{{f}_{p}}\left({x}_{ip}\right) $

into the expression from the previous step (Equation 26):

(26)

$ {\tilde{y}}_{i} = \oplus _{l=1}^{p-1}{\tilde{f}}_{l}\left({\tilde{x}}_{il}\right)\oplus {\tilde{f}}_{p}\left({\tilde{x}}_{ip}\right)\oplus {\tilde{\epsilon }}_{i}. $

By substituting

$ \tilde{{\hat{f_1}}}\left({\tilde{x}}_{i1}\right),\tilde{{\hat{f_2}}}\left({\tilde{x}}_{i2}\right),\cdots ,{\hat{\tilde{f} }}_{(p-1)}\left({\tilde{x}}_{i(p-1)}\right) $

from earlier steps and applying Hukuhara subtraction on both sides, a univariate fuzzy nonparametric regression model is formed as follows (Equation 27):

(27)

$ {\tilde{y}}_{i}{\ominus }_{H}\oplus _{l=1}^{p-1}{\tilde{\hat{f} }}_{l}\left({\tilde{x}}_{il}\right)={\tilde{f}}_{p}\left({\tilde{x}}_{ip}\right)\oplus {\tilde{\epsilon }}_{pi}. $

Similar to previous steps, the estimator

$ {\tilde{f}}_{p}\left({\tilde{x}}_{p}\right) $

can be evaluated at this final stage as (Equation28):

(28)

$ \begin{split} & {\hat{\tilde{f} }}_{p}^{K}\left({\tilde{x}}_{ip}\right)=\\& \oplus _{j=1}^{n}\left({w}_{jp}\left({\tilde{x}}_{ip};{h}_{p}\right)\otimes \left({\tilde{y}}_{j}{\ominus }_{H}\left(\oplus _{v=1}^{p-1}\oplus {\hat{{\tilde{f_v}}}}^{K}\left({\tilde{x}}_{iv}\right)\right)\right)\right),\end{split} $

The weight function for this step is defined as

$ {{{w}_{j}}}^{p}\left({\tilde{x}}_{p};{h}_{p}\right) $

= K

$\left( \frac{D\left({\tilde{x}}_{ip},{\tilde{x}}_{jp}\right)}{{h}_{p}}\right) $

$ \sum \nolimits_{j=1}^{n}K\left(\frac{D\left({\tilde{x}}_{ip},{\tilde{x}}_{jp}\right)}{{h}_{p}}\right). $

where

$ {h}_{p} $

represents the unknown bandwidth parameter.

In this study, the Gaussian kernel was used to evaluate its effects on model performance (Equation 29).

(29)

$ K\left(y\right) = \frac{1}{\sqrt{2\pi }}{e}^{\frac{{y}^{2}}{2}},\quad y\in {\rm{R}} $

The FNAR model's pseudo-code is provided in the supplementary material (S1).

2.4 Validation of models

The predictive performance of the FNAR model was evaluated using a comprehensive assessment framework that included five statistical metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Nash-Sutcliffe Efficiency (NSE), Kling-Gupta Efficiency (KGE), and Percent Bias (PBIAS). The Equations are defined as follows (Equations 30–34):

(30)

$ RMSE=\sqrt{\frac{1}{n}\sum \limits_{i=1}^{n}D_{2}^{2}\left({\tilde{Y}}_{i},{\tilde{\hat{Y} }}_{i}\right)} $

(31)

$ MAE=\frac{1}{n}\sum \limits_{i=1}^{n}{D}_{2}\left({\tilde{Y}}_{i},{\tilde{\hat{Y} }}_{i}\right) $

(32)

$ NSE=1-\frac{\sum \nolimits_{i=1}^{n}D_{2}^{2}\left({\tilde{Y}}_{i},{\tilde{\hat{Y} }}_{i}\right)}{\sum \nolimits_{i=1}^{n}D_{2}^{2}\left({\tilde{Y}}_{i},{\tilde{\bar{Y} }}_{i}\right)} $

(33)

$ KGE=1-\sqrt{{\left(r-1\right)}^{2}+{\left(\alpha -1\right)}^{2}+{\left(\beta -1\right)}^{2}} $

(34)

$ PBIAS=100\times \frac{\sum \nolimits_{i=1}^{n}\left({\tilde{Y}}_{i}-{\tilde{\hat{Y} }}_{i}\right)}{\sum \nolimits_{i=1}^{n}{\tilde{Y}}_{i}} $

In these equations,

$ {D}_{2} $

represents the distance between the observed fuzzy value

$ {\tilde{Y}}_{i} $

and the predicted fuzzy value

$ {\tilde{\hat{Y} }}_{i} $

$ {\tilde{\bar{Y} }}_{i} $

indicates the mean of the fuzzy observed values, and n signifies the total number of observations. The symbol Σ indicates that summation is performed over all values of i. For the KGE computation, the defuzzified centroid values of fuzzy observations and predictions were used.

The distance metric

$ {D}_{2} $

for two fuzzy numbers,

$ \tilde{A}={(a;{{l}_{a}},{{r}_{a}})}_{LR} $

and

$ \tilde{B}={(b;{{l}_{b}},{{r}_{b}})}_{LR} $

is defined as

${D}_{2} \left(\tilde{A},\tilde{B}\right) $

$ {\left({\left({\left| a-b\right| }^{2}+{c}_{1}{\left| {l}_{a}-{l}_{b}\right| }^{2}+{c}_{2}{\left| {r}_{a}-{r}_{b}\right| }^{2}\right)}/{3}\right)}^{0.5}$

where

$ {c}_{1}=\displaystyle\int \nolimits_{0}^{1}{L}^{-1}\left(\alpha \right)d\alpha $

and

$ {c}_{2}=\displaystyle\int \nolimits_{0}^{1}{R}^{-1}\left(\alpha \right)d\alpha $

. (Asadolahi et al. 2021).

The flowchart depicting the interaction of inputs and outputs in the FNAR modeling framework, used for estimating monthly GWL, is shown in Fig. 3.

The FNAR modeling workflow (Fig. 3) integrates data preprocessing, scenario-based input selection, and fuzzy nonlinear additive regression. The analytical core of the FNAR model lies in its fuzzy nonlinear additive structure, which assumes that groundwater level response can be decomposed into the sum of individual, smooth fuzzy functions of each climatic driver. This formulation is particularly suited to arid aquifers like Birjand, where recharge processes are largely driven by discrete, separable mechanisms (e.g., episodic rainfall infiltration vs. continuous evapotranspiration loss). The additive framework enhances interpretability by isolating the marginal effect of each variable, a critical advantage for water managers seeking actionable insights. Thus, the FNAR architecture represents a pragmatic compromise between physical interpretability, mathematical simplicity, and predictive performance in data-limited settings.

3 Results and discussion

The following section is focused on the presentation and description of the results derived from the three FNAR model scenarios on a monthly basis, beginning with the evaluation criteria obtained and concluding with the estimated maps.

3.1 Model performance

The FNAR model was evaluated under three scenarios with gradually increasing input complexity to investigate the trade-off between data requirements and prediction accuracy. Scenarios were: Sc1 (Prc, Tave), Sc2 (Prc, Tave, RH), and Sc3 (Prc, Tave, RH, ETo). Performance was assessed using RMSE, MAE, NSE, and PBIAS. This hierarchical design enabled assessment of marginal performance gains associated with additional input parameters, informing optimal model configuration for practical applications.

Comprehensive performance metrics for all three FNAR scenarios are presented in Tables 3, alongside equivalent FLSR model results for comparison (Table 4). All three FNAR scenarios substantially outperformed their equivalent FLSR counterparts, with NSE improvements ranging from 45% to 120%, demonstrating the critical importance of capturing nonlinear relationships in GWL dynamics.

This is particularly beneficial in regions with limited data availability, where traditional physical or numerical models may not be feasible (Ahmadi et al. 2022). Using only precipitation and temperature as inputs, Sc1 achieved RMSE values ranging from 1.05 m to 1.55 m, MAE from 0.40 m to 1.26 m, NSE from 0.32 to 0.83, and PBIAS from 0.448% to 0.847% across different months. The inclusion of relative humidity in Sc2 (Prc + Tave + RH) substantially improved performance, reducing RMSE to 0.42–0.99 m and increasing NSE to 0.28–0.97, consistent with recent evidence that RH modulates actual evapotranspiration and soil moisture retention, key controls on net recharge in water-limited environments (Feng et al. 2024). Further addition of reference evapotranspiration in Sc3 (Prc + Tave + RH + ETo) provided marginal gains, achieving the best overall accuracy with RMSE of 0.31–0.77 m and NSE of 0.66–0.98.

Although Sc3 yielded the highest predictive accuracy (average RMSE = 0.53 m, NSE = 0.87), Sc2 was selected as the optimal operational scenario due to its favorable trade-off between performance and data accessibility. Sc2 exhibited only a modest reduction in accuracy, with an average RMSE of 0.68 m and NSE of 0.78 compared to Sc3. However, in the context of regional groundwater management, where decision thresholds for aquifer depletion typically range from ± 0.5 m to 1.0 m (Lall et al. 2020), the performance of Sc2 remains within acceptable limits. This trade-off aligns with recent findings by Samantaray et al. (2024), who demonstrated that Prc, Tave, and RH are sufficient to drive robust GWL predictions in data-limited settings when combined with flexible modeling frameworks. Similarly, Feng et al. (2024) emphasized the dominance of these three variables in capturing recharge dynamics in semi-arid systems.

The FNAR model's advantage lies in the ability of fuzzy logic handle nonlinear relationships with fewer input parameters, making it suitable for data-sparse regions (Bardossy and Duckstein, 2022). Also, Borzì (2025) notes that the trade-off between model complexity and data requirements remains a significant consideration, favoring nonlinear configurations for deployability in arid aquifers.

Critically, Sc2 eliminates the need for ETo, which requires high-quality measurements of solar radiation, wind speed, and vapor pressure, data rarely available in basic monitoring networks, particularly in arid regions like Birjand. In contrast, precipitation, temperature, and relative humidity are routinely recorded at low-cost stations, greatly enhancing model applicability in data-scarce contexts. Given the marginal accuracy trade-off and significant gain in practical feasibility, Sc2 represents the most appropriate configuration for real-world deployment. This aligns with the emerging paradigm in sustainable hydroinformatics: Prioritizing "good-enough" models with high deployability over marginally superior but data-demanding alternatives (Borzì, 2025; Loucks, 2023). In arid regions like Birjand, where ETo estimation requires solar radiation and wind speed data rarely available outside research-grade stations, Sc2 offers a pragmatic solution for real-world groundwater monitoring.

To better visualize the performance of both models, scatter plots for each equivalent scenario are illustrated in Fig. 4. The horizontal axes are used to represent the monthly observational data of the GWL in the Birjand Plain from 1998 to 2017, while the vertical axes are utilized to display the calculated data for the corresponding levels. The plots visually confirm the quantitative results summarized in Tables 2 and 3: FNAR predictions (red dots) consistently cluster closer to the 1:1 line than FLSR predictions (blue dots), reflecting superior agreement between modeled and observed values. This improvement is most pronounced in Sc2 and Sc3, where the inclusion of RH and ETo further reduces prediction error. Notably, FLSR models exhibit greater dispersion and systematic bias (e.g., under-prediction at high GWL values), particularly in Sc1 and Sc2, underscoring the limitations of linear modeling in capturing the nonlinear dynamics governing groundwater response to climate forcing. The tight clustering of FNAR points around the 1:1 line, especially for Sc2 and Sc3, validates the model's ability to accurately capture both central tendency and variability across the full range of GWL conditions.

3.2 Temporal patterns and seasonal performance variations

Model performance exhibited strong seasonal dependence, with peak accuracy during winter–spring (December–March; NSE > 0.85 for Sc2). This reflects the dominance of direct, episodic recharge during the wet season and reduced evaporative losses due to low temperatures. Such seasonality-driven predictability has been consistently observed in semi-arid aquifers, where GWL response is tightly coupled to rainfall timing and intensity (Zowam and Milewski, 2024; Zarinmehr et al. 2022). Performance declined during transitional months (April–May, September–November; NSE: 0.65–0.80), as recharge becomes intermittent and irrigation pumping intensifies. Additionally, irrigation withdrawals for agricultural activities, primarily wheat and barley cultivation in Birjand Plain, introduce anthropogenic variability not explicitly represented in meteorological inputs. This mirrors findings by Feng et al. (2024) in northern China, where hybrid models struggled to disentangle natural recharge from anthropogenic withdrawals during crop establishment phases. Summer (June–August) posed the greatest challenge (NSE: 0.28–0.55), as GWL dynamics are governed almost entirely by unmeasured agricultural pumping, a known blind spot for meteorology-only models (Samantaray et al. 2024). This limitation underscores a fundamental trade-off: While excluding pumping data enhances model generalizability across data-scarce regions, it inherently caps summer prediction skill. Similar challenges are noted in hybrid ELM models for Bangladeshi aquifers (Adnan et al. 2023), where meteorology-only inputs yielded NSE of approximately 0.73 but faltered during dry seasons due to unmodeled extraction, advocating parsimonious inputs for broader applicability.

3.3 Comparison with State-of-the-Art GWL models

While direct quantitative comparison is constrained by site-specific differences in aquifer properties, GWL ranges, and climatic conditions, several important insights emerge, from comparing the outcomes with the litureture to help contextualize FNAR performance within the broader landscape of GWL prediction models. Samantaray et al. (2024) reported RMSE of 0.045–0.12 m and NSE of 0.92–0.96 using the SVR-FFAPSO model in Indian watersheds, but their model required antecedent GWL, streamflow, and six climatic inputs, rendering it inapplicable to ungauged basins. Bhadani et al. (2024) achieved correlation coefficient of 0.89 and NMSE of 0.18 with the F-IWO-GWL algorithm yet depended on historical GWL and soil texture data, both often unavailable in developing regions. In contrast, the present results with FNAR under Sc2 attain NSE up to 0.97 using only three meteorological variables, with no lagged GWL. This input parsimony is not a compromise but a strategic advantage: It enables deployment in the >70% of global aquifers lacking continuous piezometric monitoring (Jiang et al. 2024). Moreover, unlike black-box hybrids such as ANFIS-GWO (Vadiati et al. 2023) or CNN-LSTM ensembles (Yang and Zhang, 2022), FNAR's additive structure provides interpretable attribution thereby supporting informed management (Wang and Li, 2025).

A critical advantage of FNAR lies in its input data requirements. Samantaray et al. (2022) required 18 years of data with six input variables including discharge and antecedent GWL. Vadiati et al. (2023) utilized lagged GWL measurements (t-1, t-2, t-3 months), necessitating continuous monitoring infrastructure. Machine learning-based fuzzy models typically require extensive hyperparameter optimization. For example, SVR-FFAPSO (Samantaray et al. 2024) necessitates tuning of support vector regression parameters plus firefly-particle swarm optimization parameters (population size, generations, inertia weights, attraction coefficients), representing a 7-10 dimensional optimization problem. ANFIS-GWO (Samantaray et al. 2022) similarly requires optimization of fuzzy membership functions plus grey wolf optimizer parameters. Conversely, FNAR requires only the bandwidth parameter (h) selection for each input dimension through leave-one-out cross-validation, resulting in a substantially simpler optimization landscape.

An additional advantage of FNAR over black-box machine learning approaches lies in interpretability. The additive structure enables decomposition of GWL predictions into contributions from individual climatic drivers, facilitating physical understanding of aquifer responses. Machine learning models, while often achieving marginally higher accuracy, provide limited insight into underlying mechanisms, constraining their utility for process understanding and stakeholder communication. This agrees with Borzì (2025), who advocates hybrid data-driven approaches in data-scarce regions to balance "model complexity" with "practical applicability".

The FNAR model's success in GWL prediction can be attributed to its capability to capture nonlinear relationships inherent in aquifer-climate interactions. Aquifer recharge exhibits threshold behavior, with negligible recharge during small precipitation events that are entirely consumed by evapotranspiration and soil moisture deficit replenishment. Recharge increases nonlinearly with precipitation magnitude above threshold values, as demonstrated by lysimeter studies in arid regions (Hu et al. 2020; Koonce, 2016). The FNAR model's nonparametric kernel-weighted structure effectively captures these threshold dynamics without requiring explicit specification of functional forms, providing flexibility absent in linear models like FLSR. Finally, FNAR's fuzzy framework explicitly propagates input uncertainty into prediction intervals, a feature increasingly mandated in modern hydrological modeling (Moges et al. 2021). Deterministic models, by contrast, yield overconfident point estimates that mask data quality issues, potentially misleading decision-makers during droughts or policy evaluations.

3.4 Advantages, limitations, and future direction

This study highlights the effectiveness of the FNAR model for predicting groundwater levels in data-scarce arid regions. Using only precipitation, temperature, and relative humidity, the model achieves strong accuracy (NSE > 0.75), quantifies uncertainty through fuzzy outputs, offers interpretable climate-driven insights, and operates effectively without prior groundwater observations. However, this study faced limitations primarily related to data availability and quality. The use of monthly averaged GWL measurements, often recorded on random days, may have introduced discrepancies, especially if significant rainfall occurred after data collection. Inconsistent measurement timing across wells also posed challenges, as monthly averaged weather variables assumed uniform conditions, potentially misattributing water level variances. Although fuzzification helped mitigate these issues, more precise data would have enhanced model accuracy. These limitations highlight the need for higher temporal resolution and consistent data collection practices to improve model reliability.

Future research should integrate remotely sensed proxies for groundwater extraction, such as GRACE satellite data, land surface temperature, or NDVI-based crop water use estimates. In addition, hybridizing FNAR with physics-informed constraints (e.g., simplified mass balance equations) could improve extrapolation under non-stationary climate conditions. Finally, extending the framework to spatially distributed predictions (by coupling FNAR with geostatistical interpolation of piezometric data) would enable well-specific forecasts, supporting localized management interventions in heterogeneous aquifers. Such extensions align with recommendations in hydroinformatics reviews for adaptive, multi-scale modeling to enhance sustainability in arid basins

4 Conclusion

Sustainable groundwater management in arid- and semi-arid regions requires predictive models that are both accurate and feasible under severe data constraints. In this study, we addressed this need by developing and presenting the FNAR model for monthly GWL prediction in the Birjand aquifer, Iran. Using only meteorological inputs under three scenarios, FNAR achieved high predictive accuracy, with the optimal scenario (Sc2: Precipitation, temperature, relative humidity) yielding an average NSE of 0.78 and RMSE of 0.68. The FNAR model does not require antecedent GWL measurements, allowing deployment in ungauged or poorly monitored basins. It also demonstrates that accurate predictions are achievable with only three widely available meteorological variables.

Future research will extend FNAR to spatially distributed forecasting and integrate remote sensing proxies (e.g., GRACE, NDVI) to implicitly capture anthropogenic influences, addressing the model's current limitation in summer months, thus enhancing its predictive capabilities and enabling its application to a diverse range of hydrological settings.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Adnan RM, Dai HL, Mostafa RR, et al. 2023. Modelling groundwater level fluctuations by ELM merged advanced metaheuristic algorithms using hydroclimatic data. Geocarto International, 38(1): 2158951. DOI: 10.1080/10106049.2022.2158951.

[2]	Ahmadi A, Olyaei M, Heydari Z, et al. 2022. Groundwater level modeling with machine learning: A systematic review and meta-analysis. Water, 14(6): 949. DOI: 10.3390/w14060949.

[3]	Ahmadifar R, Safavi HR, Mirabbasi R, et al. 2025. A hybrid vine copula-fuzzy model for groundwater level simulation under uncertainty. Environmental Monitoring and Assessment, 197(4): 403. DOI: 10.1007/s10661-025-13856-3.

[4]	Ang YK, Talei A, Zahidi I, et al. 2023. Past, present, and future of using neuro-fuzzy systems for hydrological modeling and forecasting. Hydrology, 10(2): 36. DOI: 10.3390/hydrology10020036.

[5]	Aryafar A, Khosravi V, Karami, S. 2020. Groundwater quality assessment of Birjand plain aquifer using kriging estimation and sequential Gaussian simulation methods. Environmental Earth Sciences, 79: 1−21. DOI: 10.1007/s12665-020-08905-8.

[6]	Asadolahi M, Akbari MG, Hesamian G, et al. 2021. A robust support vector regression with exact predictors and fuzzy responses. International Journal of Approximate Reasoning, 132: 206−225. DOI: 10.1016/j.ijar.2021.02.006.

[7]	Bardossy A, Duckstein L. 2022. Fuzzy rule-based modeling with applications to geophysical, biological, and engineering systems. CRC Press, London, United Kingdom. DOI: 10.1201/9780138755133

[8]	Bhadani V, Singh A, Kumar, V, et al. 2024. Nature-inspired optimal tuning of input membership functions of a fuzzy inference system for groundwater level prediction. Environmental Modelling and Software, 175: 105995. DOI: 10.1016/j.envsoft.2024.105995.

[9]	Bogardi I, Bardossy A, Duckstein L. 1983. Regional management of an aquifer for mining under fuzzy environmental objectives. Water Resources Research, 19(6): 1394−1402. DOI: 10.1029/WR019i006p01394.

[10]	Boo KBW, El-Shafie A, Othman F, et al. 2024. Groundwater level forecasting using ensemble coactive neuro-fuzzy inference system. Science of The Total Environment, 912: 168760. DOI: 10.1016/j.scitotenv.2023.168760.

[11]	Borzì I. 2025. Modeling groundwater resources in data-scarce regions for sustainable management: Methodologies and limits. Hydrology, 12(1): 2306−5338. DOI: 10.3390/hydrology12010011.

[12]	Condon LE, Kollet S, Bierkens MF, et al. 2021. Global groundwater modeling and monitoring: Opportunities and challenges. Water Resources Research, 57(12): e2020WR029500. DOI: 10.1029/2020WR029500.

[13]	Feng F, Ghorbani H, Radwan AE. 2024. Predicting groundwater level using traditional and deep machine learning algorithms. Frontiers in Environmental Science, 12: 1291327. DOI: 10.3389/fenvs.2024.1291327.

[14]	Hoogesteger J. 2022. Regulating agricultural groundwater use in arid and semi-arid regions of the Global South: Challenges and socio-environmental impacts. Current Opinion in Environmental Science and Health, 100341. DOI: 10.1016/j.coesh.2022.100341

[15]	Hu H, Yang K, Sharma A, et al. 2020. Assessment of water and energy scarcity, security and sustainability into the future for the Three Gorges Reservoir using an ensemble of RCMs. Journal of Hydrology, 586: 124893. DOI: 10.1016/j.jhydrol.2020.124893.

[16]	Kambalimath S, Deka PC. 2020. A basic review of fuzzy logic applications in hydrology and water resources. Applied Water Science, 10(8): 1−14. DOI: 10.1007/s13201-020-01276-2.

[17]	Koonce JE. 2016. Water balance and moisture dynamics of an arid and semi-arid soil: A weighing lysimeter and field study. DOI: 10.34917/9112095

[18]	Kuang X, Liu J, Scanlon BR, et al. 2024. The changing nature of groundwater in the global water cycle. Science, 383(6686): eadf0630. DOI: 10.1126/science.adf0630.

[19]	Lall U, Josset L, Russo T. 2020. A snapshot of the world's groundwater challenges. Annual Review of Environment and Resources, 45(1): 171−194. DOI: 10.1146/annurev-environ-102017-025800.

[20]	Loucks DP. 2023. Hydroinformatics: A review and future outlook. Cambridge Prisms: Water, 1: e10. DOI: 10.1017/wat.2023.10.

[21]	Malakar P, Bhanja SN, Dash AA, et al. 2022. Delineating variabilities of groundwater level prediction across the agriculturally intensive transboundary aquifers of South Asia. ACS ES& T Water, 3(6): 1547−1560. DOI: 10.1021/acsestwater. 2c00220.

[22]	Moges E, Demissie Y, Larsen L, et al. 2021. Sources of hydrological model uncertainties and advances in their analysis. Water, 13(1): 28. DOI: 10.3390/w13010028.

[23]	NavaleV, Mhaske S. 2023. Artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) model for Forecasting groundwater level in the Pravara River Basin, India. Modeling Earth Systems and Environment, 9(2): 2663−2676. DOI: 10.1007/s40808-022-01639-5.

[24]	Rahimi M, Ebrahimi H. 2023. Data-driven driven to underground water level using artificial intelligence hybrid algorithms. Scientific Reports, 13(1): 10359. DOI: 10.1038/s41598-023-35255-9.

[25]	Ramezani N. 2022. Modern statistical modeling in machine learning and big data analytics: Statistical models for continuous and categorical variables. In research anthology on machine learning techniques, methods, and applications (pp. 90−106). IGI Global. DOI: 10.4018/978-1-6684-6291-1.ch007.

[26]	Rashidi GH, Katibeh H, Maleki A. 2024. Forecasting groundwater fluctuations caused by earthquakes using fuzzy logic and AHP Method: A case study from Iran. Earth Science Informatics, 17(3): 2143−2158. DOI: 10.1007/s12145-024-01264-z.

[27]	Saikrishnamacharyulu I, Mohanta NR, Kumar MH, et al. 2022. Simulation of water table depth using a hybrid CANFIS model: A Case study. In Intelligent System Design: Proceedings of INDIA 2022 (pp. 319−328). Singapore: Springer Nature Singapore. DOI: 10.1007/978-981-19-4863-3_30.

[28]	Samani S, Vadiati M, Azizi F, et al. 2022. Groundwater level simulation using soft computing methods with emphasis on major meteorological components. Water Resources Management, 36(10): 3627−3647. DOI: 10.1007/s11269-022-03217-x.

[29]	Samantaray S, Biswakalyani C, Singh DK, et al. 2022. Prediction of groundwater fluctuation based on a hybrid ANFIS-GWO approach in an arid Watershed, India. Soft Computing, 26(11): 5251−5273. DOI: 10.1007/s00500-022-07097-6.

[30]	Samantaray S, Sahoo A, Baliarsingh F. 2024. Groundwater level prediction using an improved SVR model integrated with hybrid particle swarm optimization and firefly algorithm. Cleaner Water, 1: 100003. DOI: 10.1016/j.clwat.2024.100003.

[31]	Sun J, Hu L, Li D, et al. 2022. Data-driven models for accurate groundwater level prediction and their practical significance in groundwater management. Journal of Hydrology, 608: 127630. ‏ DOI: 10.1016/j.jhydrol.2022.127630

[32]	Tao H, Hameed MM, Marhoon HA, et al. 2022. Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing, 489: 271−308. DOI: 10.1016/j.neucom.2022.03.014.

[33]	Theodoridou PG, Varouchakis EA, Karatzas GP. 2017. Spatial analysis of groundwater levels using fuzzy logic and geostatistical tools. Journal of Hydrology, 555: 242−252. DOI: 10.1016/j.jhydrol.2017.10.027.

[34]	Varouchakis EA, Theodoridou PG, Karatzas GP. 2019. Spatiotemporal geostatistical modeling of groundwater levels under a Bayesian framework using means of physical background. Journal of Hydrology, 575: 487−498. DOI: 10.1016/j.jhydrol.2019.05.055.

[35]	Yang X, Zhang Z. 2022. A CNN-LSTM model based on a meta-learning algorithm to predict groundwater level in the middle and lower reaches of the Heihe River, China. Water, 14(15): 2377. DOI: 10.3390/w14152377.

[36]	Zaresefat M, Derakhshani R. 2023. Revolutionizing groundwater management with hybrid AI models: A practical review. Water, 15(9): 1750. DOI: 10.3390/w15091750.

[37]

Zarinmehr H, Tizro AT, Fryar AE, et al. 2022. Prediction of groundwater level variations based on gravity recovery and climate experiment (GRACE) satellite data and a time-series analysis: a case study in the Lake Urmia basin, Iran. Environmental Earth Sciences, 81(6): 180. DOI: 10.1007/s12665-022-10296-x.

[38]	Zowam FJ, Milewski AM. 2024. Groundwater level prediction using machine learning and geostatistical interpolation models. Water, 16(19): 2771. DOI: 10.3390/w16192771.

RIGHTS & PERMISSIONS

Journal of Groundwater Science and Engineering Editorial Office

PDF (1652KB)

663

Accesses

Citation

Detail

Sections

Recommended

About the journal

Browse

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

Introduction

1 Study area

2 Materials and methods

2.1 Data setsand quality control

2.2 Model development for GWL prediction

2.3 Model development

2.3.1 Basic concepts on fuzzy numbers

2.3.2 Fuzzy Least Squares Regression Model (FLSR)

2.3.3 Fuzzy Nonlinear Additive Regression (FNAR) model

2.4 Validation of models

3 Results and discussion

3.1 Model performance

3.2 Temporal patterns and seasonal performance variations

3.3 Comparison with State-of-the-Art GWL models

3.4 Advantages, limitations, and future direction

4 Conclusion

References

RIGHTS & PERMISSIONS