Utilizing machine learning models to grasp water quality dynamic changes in lake eutrophication through phytoplankton parameters

Yong Fang, Ruting Huang, Yeyin Zhang, Jun Zhang, Wenni Xi, Xianyang Shi

Front. Environ. Sci. Eng. ›› 2025, Vol. 19 ›› Issue (2) : 14.

PDF(6551 KB)
Front. Environ. Sci. Eng. All Journals
PDF(6551 KB)
Front. Environ. Sci. Eng. ›› 2025, Vol. 19 ›› Issue (2) : 14. DOI: 10.1007/s11783-025-1934-6
RESEARCH ARTICLE

Utilizing machine learning models to grasp water quality dynamic changes in lake eutrophication through phytoplankton parameters

Author information +
History +

Highlights

● Accurate identification of lake eutrophication was achieved via ML models.

● XGBoost model has superior performance in identifying limiting nutrients.

● LightGBM model effectively uses phytoplankton for water quality characterization.

● ML model with TN/TP ratio and phytoplankton can track lake eutrophication dynamics.

Abstract

Phytoplankton serve as vital indicators of eutrophication levels. However, relying solely on phytoplankton parameters, such as chlorophyll-a, limits our comprehensive understanding of the intricate eutrophication conditions in natural lakes, particularly in terms of timely analysis of changes in limiting nutrients and their concentrations. This study presents machine learning (ML) models for predicting and identifying lake eutrophication. Five tree-based ML models were developed using the latest data on hydrological, water quality, and meteorological parameters obtained from 34 sites in the Huating Lake basin over 5 months. The extreme gradient boosting model exhibited high accuracy in predicting the total nitrogen/total phosphorus ratio (TN/TP) (R2 = 0.88; RMSE = 24.60; MAPE = 26.14%). Analysis of the TN/TP ratio and output eigenvalue weight revealed that phosphorus plays a crucial role in eutrophication, probably because of the low-flow and deep-water characteristics of the basin. Furthermore, the light gradient boosting machine model exhibited outstanding performance and high accuracy in predicting phytoplankton parameters, especially the Shannon index (H′) (R2 = 0.92; RMSE = 0.11; MAPE = 4.95%). The mesotrophic classification of the Huating Lake determined using the H′ threshold, coincided with the findings from the H′ analysis. Future research should cover a wider range of pollution sources and spatiotemporal dimensions to further validate our findings. Overall, this study highlights the potential of incorporating the TN/TP ratio and phytoplankton parameters into ML techniques for effective monitoring and management of environmental conditions.

Graphical abstract

Keywords

Machine learning / Lake / Phytoplankton / Water quality

Cite this article

Download citation ▾
Yong Fang, Ruting Huang, Yeyin Zhang, Jun Zhang, Wenni Xi, Xianyang Shi. Utilizing machine learning models to grasp water quality dynamic changes in lake eutrophication through phytoplankton parameters. Front. Environ. Sci. Eng., 2025, 19(2): 14 https://doi.org/10.1007/s11783-025-1934-6

1 Introduction

Phytoplankton are essential for maintaining the health and functionality of lake ecosystems. Changes in their composition and relative proportions directly affect the structure, function, and stability of aquatic ecosystems (Derot et al., 2020). Excessive phytoplankton proliferation resulting from lake eutrophication can disrupt ecosystem stability and reduce biodiversity (Znachor et al., 2020). Ensuring the sustainable development of ecosystems requires identifying and predicting eutrophication trends in lakes by considering phytoplankton dynamics (Conley et al., 2009).
The biomass and community structure of phytoplankton are intricately linked to various environmental factors (Shan et al., 2019; Uddin et al., 2024a). Many scholars have investigated the effects of water quality physicochemical parameters such as nutrients, pH, water temperature (T), and transparency on phytoplankton (Muhid et al., 2013; Carrasco Navas-Parejo et al., 2020). Recent studies on lake water quality have identified key indicators, revealing a significant association between parameters, such as total nitrogen (TN), total phosphorus (TP), chlorophyll-a (Chl-a), and the trophic status of lakes. Evidence indicates that temperature and nutrients significantly promote cyanobacteria proliferation in phytoplankton (Carrasco Navas-Parejo et al., 2020). For example, high year-round solar radiation, temperature, and nutrient supply in tropical estuaries contribute to high phytoplankton productivity (Carrasco Navas-Parejo et al., 2020). Researchers have experimentally determined that adding nitrogen (N) and phosphorus (P) can significantly affect the biomass, growth, and species composition of phytoplankton (Muhid et al., 2013).
Phytoplankton parameters are key indicators of eutrophication (Derot et al., 2020). Xiong et al. (2022) used a nutrient-driven dynamic eutrophication model based on a Bayesian hierarchical framework and spatial ecosystem bifurcation analysis to investigate the impact of nutrient and phytoplankton parameters on lake eutrophication. Derot et al. (2020) also analyzed the response of phytoplankton traits to various environmental variables and identified relevant traits to develop future indicators. However, complex eutrophication conditions in natural lakes may be inaccurately identified because these studies have largely focused on single phytoplankton parameters (e.g., Chl-a), lack the combination of multisource data for hydrological, water quality, and meteorological parameters, and used simple statistical analyses of physicochemical parameters (Muhid et al., 2013; Shan et al., 2019).
The Shannon (H′), Margalef (H), Simpson (D), and Pielou (J) indices are other key phytoplankton parameters that can be used to assess phytoplankton community diversity and eutrophication levels in aquatic ecosystems (Li et al., 2022). Meng et al. (2020) investigated the response of phytoplankton α-diversity to changes in nutrient levels in the Songhua River and revealed its potential as an indicator of aquatic habitat health. Although another study conducted in the Jiulong River Estuary and Jingpo Lake acknowledged the role of phytoplankton diversity indices in indicating trophic changes, the most effective indices for predicting eutrophication were not clarified. This lack of specificity observed in various studies highlights a knowledge gap regarding the most appropriate phytoplankton parameters for accurate eutrophication prediction in lakes (Ge et al., 2022). Furthermore, the TN/TP ratio is a primary limiting factor for phytoplankton growth and improves our understanding of the composition and structure of phytoplankton communities in aquatic ecosystems (Qin et al., 2020). Although Jiang and Nakano (2022) emphasized the impact of trophic status on N and P, which are crucial for the functioning and primary growth limitation of phytoplankton in aquatic ecosystems, studies integrating TN/TP ratios with phytoplankton parameters to systematically investigate eutrophication in lakes are lacking.
The complex morphologies of lakes, along with the effects of human activities and weather perturbations in real-world scenarios, present challenges in revealing the relationship between TN/TP ratios and phytoplankton parameters to accurately identify and predict lake eutrophication trends (Feng et al., 2021). To date, many tools and techniques have been developed for water quality assessment. However, recent studies have highlighted that models used to assess lake water quality may introduce significant uncertainty during the modeling (Rezaie-Balf et al., 2020; Fortes et al., 2023; Uddin et al., 2023b). The emergence of machine learning (ML) models has provided a potent framework for addressing these challenges. To optimize the uncertainty within developed prediction models for lake water quality, some researchers have employed ML techniques to mitigate model uncertainty and achieve accurate water quality predictions (Ding et al., 2023; Georgescu et al., 2023). The literature has indicated that algorithms rooted in ensemble trees, such as eXtreme Gradient Boosting (XGBoost) and Random Forest (RF), exhibit effectiveness in the accuracy of predicting lake water quality (Uddin et al., 2022b; Georgescu et al., 2023; Uddin et al., 2023a). In addition, Xiong et al. (2019) introduced the XGBoost ML framework to establish a correlation between the phytoplankton algal index and TP in phytoplankton-dominated aquatic environments, demonstrating the effectiveness of employing ML techniques for lake eutrophication assessment. Liu et al. (2022) also demonstrated the feasibility of using small sample data sets for predicting odor concentrations in laying hen houses based on an ML model with small sample data. The methodology sparks interest and markedly enhances our capacity to evaluate and comprehend lake eutrophication, thereby providing a promising avenue for future research and environmental monitoring.
Over five months, we gathered hydrological, water quality, and meteorological data from 34 observation points within the Huating Lake basin. The comprehensive data set enabled the establishment of five tree-based ML models: decision tree (DT), RF, XGBoost, classification enhancement (CatBoost), and light gradient boosting machine (LightGBM). The primary objectives were 1) identifying the optimal ML model for predicting TN/TP and delineating changes in the trophic levels of the lake, 2) leveraging the ML models to discern the limiting nutrients dictating eutrophication within the lake, and 3) employing the ML models to track changes in water quality by analyzing variations in phytoplankton parameters. This study epitomizes an innovative approach to forecasting and analyzing changes in lake water quality using ML models, potentially heralding a novel perspective on lake eutrophication management.

2 Materials and methods

2.1 Study area

The study area is the Huating Lake basin (30°26′–30°38′ N, 115°58′–116°18′ E), located at the southern foot of Dabie Mountain and on the northern bank of the Yangtze River, in Anhui Province, China (Table S1). Sampling points were selected based on relevant anthropogenic and environmental factors within the study area. The Huating Lake basin experiences four distinct seasons, with an annual average T of 16.4 °C. The annual average sunshine was 1938 h; the frost-free period was 249 d; the annual average rainfall was 1368.4 mm; the dominant wind direction was north-west and south-east; the annual average wind speed was 3.2 m/s. In addition, the Huating Lake basin has a basin area of 1870 km2, an average annual inflow of 46.3 m3/s, an average annual incoming water volume of 1.46 × 109 m3, and a total reservoir capacity of 2.366 × 109 m3, with a normal storage level of 88.00 m. Additionally, Huating Lake serves as a critical water source for the surrounding region, primarily fed by the Dianqian, Anle, Qingshi, Shuyan, and Siqian Rivers, and then flows into the Wan River and eventually into the Yangtze River, with average annual inflows of 6.0, 6.0, 7.6, 8.0, and 12.0 m3/s, respectively (Feng, 2007). In the lake’s region of inflow rivers, characterized by dense population, intensive livestock and poultry farming, and extensive agricultural activities, untreated pollutants are frequently discharged into the lake through tributaries. In recent decades, the lake has undergone a gradual intensification of eutrophication, which is closely associated with rapid population growth and accelerated economic development. This had particularly severe effects during periods of low water. The basin is affected by pollution from agricultural, industrial, and domestic sources, and nutrient levels vary widely. This makes it an ideal location for studying the dynamics of water quality changes in eutrophic lakes.

2.2 Data acquisition and laboratory analysis

This study focused on 34 sampling points in Huating Lake, and the constructed data set encompassing meteorological, physicochemical, and phytoplankton data (Table S1). Sampling was performed bi-monthly, with each session lasting for 2 d and encompassing 29 parameters. Seasonal monitoring at these 34 locations was performed in June, July, and September 2022 and in March and April 2023, as detailed in Tables S1 and S2. These months were selected to adequately reflect the three periods before, during, and after the onset of eutrophication in the lake, capture different values of water quality parameters, such as TP and TN, and evaluate the predictive ability and robustness of the model under different meteorological (rainy and dry season) conditions and trophic scenarios.
In this study, we employed a YSI multiparameter water quality sonde (YSI 6600 V2, Yellow Springs Instruments, USA) for in situ acquisition of key water quality parameters, including T, pH, dissolved oxygen (DO), electrical conductivity (EC), and redox potential (ORP). Additionally, Secchi depth (SD) was measured using a Secchi disk (SD20, Beijing Purity Instrument Co., Ltd., China), and the water flow velocity (v) was measured using a Siemens SITRANS FUS1010 ultrasonic flow meter (Siemens, Germany). Before each measurement, the instrument was calibrated according to the manufacturer’s instructions, and three repeat measurements were performed at each sampling point; these were then averaged to minimize errors. All data were double-checked after being recorded to ensure completeness and accuracy. Deionized water was used as a blank sample during sampling to detect potential field contamination. These measures ensured that the water quality data, were highly accurate and reliable, thus providing a solid database for subsequent analysis and model predictions.
We collected and mixed three surface water samples (0–50 cm deep) at each sampling point using three 1-L polyethylene (PP) bottles for laboratory analysis. These PP bottles were pre-moistened with sampling water before use to minimize the potential influence of the container on the samples. The samples for physical and chemical parameter analysis were immediately acidified with concentrated sulfuric acid (H2SO4) (98% purity, Sinopharm Chemical Reagent Co., Ltd., China) to a pH of 1 to 2 upon collection, stabilizing the samples and preventing chemical changes. Water samples for phytoplankton identification were treated with 10 mL of Lugol’s iodine solution (5% purity, Mreda, China) and stored under dark conditions at 2 to 5 °C, with prompt processing and analysis upon arrival at the laboratory (Xiong et al., 2019). In the laboratory, we determined the concentrations of TN, ammonia nitrogen (NH4–N), TP, chemical oxygen demand (CODMn), and active phosphate (AP) in the water samples, following the standard methods outlined by standard methods for the examination of water and wastewater (Jenkins, 1982). To ensure the accuracy of these tests, regular calibration of laboratory instruments was conducted, along with quality control using standard samples. The microcystin concentration (MC) was determined using a validated high-performance liquid chromatography method to ensure the reliability and repeatability of the results (Xiong et al., 2022).
For phytoplankton identification, data were gathered via water sample separation, followed by microscopy to observe and identify the morphology and structure of the samples (Behrenfeld et al., 2021). The samples were observed repeatedly to ensure the precision of the findings and subsequently compared with known reference specimens. The National Meteorological Data Center provided the meteorological data.

2.3 Data analysis and maximal information coefficient

During data analysis, the “outliers” package in R language was specifically utilized to identify and address outliers in the data set. Furthermore, for statistical analysis, we employed the “stats” package in R to conduct basic statistical tests, “dplyr” for data manipulation, and “ggplot2” for data visualization. First, descriptive statistics were performed for all parameters, including the mean, minimum, and maximum values. To ensure the completeness and reliability of the data analysis, the interquartile range (IQR) method was employed to identify potential outliers in the data set; the robustness of this method in detecting true anomalies is widely recognized. In addition, exploiting recent advances in the identification of data outliers, we used the isolation forest technique recommended in recent studies. The results demonstrate that no outliers were found in our data set (Uddin et al., 2024b). All data were transformed into log10 before analysis to satisfy the normality and chi-square conditions for variance. Ward’s hierarchical cluster analysis (CA) based on the observed mean and squared Euclidean distances throughout the period was used to determine spatial groupings. The axes of the resulting dendrogram were scaled by percentage, i.e., (Dlink/Dmax) 100 (Singh et al., 2005), to show the ratio of distance to peak for the data points (Fig. S1). The parameters used for CA included physical (T, pH, v, EC, ORP, and SD), chemical (TN, TP, TN/TP NH4–N, DO, CODMn, and AP), and biological parameters (MC, H′, H, J, and D) (Table S2). In addition, the role of threshold division in determining the influence of nutrient parameters on lake eutrophication was examined for TN/TP. Threshold division was also applied to assess the eutrophication status of lakes based on phytoplankton parameters H and H′, and the result was determined by a higher eutrophication level (Tables S3 and S4). One-way analysis of variance (ANOVA) was employed to determine whether significant differences existed in the physicochemical parameters between the sampling locations and sampling times. Moreover, in the groups where significant differences were found, Fisher’s least significant difference (LSD) test was used to further determine which groups differed significantly from each other.
Irrelevant input attributes can introduce model uncertainty or bias into final evaluations, making input selection a critical component of model development (Uddin et al., 2024a). Many instances of suboptimal model performance can be attributed to improper input selection, with issues such as multicollinearity among parameters serving as a prime example (Yu et al., 2015; Zhang et al., 2022). This literature emphasizes the significance of a multicollinearity assessment before model development (Yu et al., 2015; Uddin et al., 2024a). Typically, multicollinearity issues are identified using various tools and techniques, such as variance inflation factor, Pearson correlation, and Spearman correlation (Uddin et al., 2024a). An increase in the number of input features in ML models and the presence of colinear variables (i.e., high feature correlation) can yield biased prediction results (Reddy et al., 2020). Considering the range of applicability and whether the data required normalization, computational complexity, and robustness, we employed a maximum information coefficient (MIC) algorithm for feature screening. This algorithm determines the degree of correlation between variables by calculating the maximum joint probability between two variables for different scale grid distributions. In addition, a MIC threshold was selected as the target parameter for feature relevance classification (high and low correlation) to retain the maximum amount of information regarding valid features (Reddy et al., 2020). MIC was used to assess the strength of the correlation between parameters, and the Pearson correlation coefficient was used to determine the direction of the correlation (positive or negative).

2.4 Model construction and evaluation

Five integrated tree-based learning models (DT, RF, XGBoost, CatBoost, and LightGBM) were developed using the scikit-learn package in Python 3.9 (refer to Text S1 for a comprehensive description, rationale, and limitations of the model comparison). These models were based on MIC-retained feature parameters (Fig.1 for the development process). For model development, all data sets (144 samples) were randomly split into training (70%) and test (30%) subsets. A hierarchical k-fold cross-validation approach (5-fold and 100 replications) was used to determine the number of trees for all models (except DT) to avoid overfitting and minimize sampling bias (Uddin et al., 2023b). We found that 20 trees were sufficient and that using more trees did not significantly improve model performance. After determining the number of trees, the hyperparameters of each model were optimized using k-fold (k = 5) cross-validation. The optimized hyperparameter values and average computation time for each model are detailed in Tables S5–S9.
Fig.1 Flowchart of the model development and evaluation, followed by feature importance analyses.

Full size|PPT slide

The predictive performance of the models was evaluated based on the mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2), as listed in Text S2. A comprehensive description of the parameters used to evaluate the performance of the ML models is provided in Text S2. To determine the best model, an evaluation based on each parameter was conducted (Hu et al., 2023). In addition, we shifted the output values from the next two-time intervals to the input values to predict the levels of TN/TP and phytoplankton parameters in lake waters.
The best prediction model among different models in terms of the three metrics was selected to explore the key target parameters. The weights of each feature of the DT and the other four integrated DT-based learning models were calculated using the Gini coefficient, which was called using the feature importance function in the scikit-learn package (Uddin et al., 2022a). Finally, April 2023 data were used as a validation set to revalidate the optimal models for different target parameters, enabling the performance assessment of the constructed models in predicting unknown data sets.

3 Results and discussion

3.1 Spatiotemporal variation in water quality parameters

According to CA (Fig. S1), the sampling sites in Huating Lake were divided into four groups: LS-1 (L1–L6 and L14), LS-2 (L7–L13), LS-3 (L15–L28), and LS (L1–L28). The data characteristics of the diverse groups differed (Table S10). One-way ANOVA and Fisher’s LSD methods were used to identify significant differences between the groups (Fig.2 and Fig.3). Additionally, the rivers entering the lake were classified into three groups based on lake groupings: RS-1 (Siqian River), RS-2 (Dianqian, Anle, Qingshi, and Shuyan Rivers), and RS (Dianqian, Anle, Qingshi, Shuyan, and Siqian Rivers) (Table S11). The monthly variations of different parameters in the lake and the rivers flowing into the lake exhibited fluctuating upward and downward trends, especially the nutrient parameters that more frequently reached their maximum values in September. Evaluation of the eutrophication threshold values determined by TN (0.58 mg/L) and TP (0.029 mg/L) (Table S3) indicates that the Huating Lake basin is eutrophic. For TP, the maximum value occurred at LS-2 in September 2022 (0.141 mg/L), whereas the minimum value occurred at LS-3 in March 2023 (0.003 mg/L), with a mean value of 0.052 mg/L for the entire lake. Conversely, the maximum value of the inflow rivers occurred at RS-2 in June 2022 (0.09 mg/L), with a mean value of 0.061 mg/L, which was 17.3% higher than the average value (Fig.2(a)). For TN, the maximum value occurred at LS-2 in September 2022 (3.32 mg/L), the minimum value occurred at LS-3 in June 2022 (0.015 mg/L), and the average value for the entire lake was 0.94 mg/L. The maximum value for the inflow rivers occurred at RS-2 in September 2022 (1.46 mg/L), and the average value was 1.12 mg /L, which was 19.2% above the lake average (Fig.2(b)). Regarding the limiting TN/TP values that determine lake eutrophication, the maximum value occurred at LS-1 in February 2023 (453.33), while the minimum value occurred at LS-3 in September 2022 (0.023), with a mean value of 68.15 for the entire lake. The maximum value for the inflow rivers occurred at RS-2 in September 2022 (145.23), and the mean value in the lake river was 44.35 (Fig.2(c)). These results indicate that eutrophication in the Huating Lake basin is limited by P rather than N. Comprehensive analysis of the above results demonstrates that the Huating Lake basin exhibits significant spatiotemporal differences in water quality.
Fig.2 Spatiotemporal variation in environmental chemical parameters (TP, TN, TN/TP, and MC) in the Huating Lake basin during June, July, and September 2022 and March 2023. (a) TP; (b) TN; (c) TN/TP; (d) MC.

Full size|PPT slide

Fig.3 Spatiotemporal variation in environmental biological parameters (algae cell density (ACD), H′, J, D, 1/D, and H) in the Huating Lake basin during June, July, and September 2022 and March 2023. (a) ACD; (b) H′; (c) J; (d) D; (e) 1/D; (f) H.

Full size|PPT slide

This outcome can be attributed to a more significant influx of N and P. Within the basin, anthropogenic activities, such as agriculture, industrial operations, and urban emissions, may influence the river inflows (Li et al., 2022). The rivers flowing into the lake exhibit higher P concentrations, facilitating P accumulation and enrichment within the lake, and promoting the proliferation of eutrophication (Li et al., 2021). Moreover, the N input and recycling processes may have affected the findings. The high N concentration of the basin, where N input and cycle processes may differ from P input and cycle processes, may undergo various processes, including biotransformation and gas exchange, leading to greater N fluctuations (Howarth and Marino, 2006). Conversely, P may have a greater propensity for accumulation and enrichment. In addition, the degree of P enrichment in lakes depends on the balance between input, outflow, and deposition, which is controlled by the water residence time, and is positively correlated with water depth. Huating Lake is typically a small-flow and deep-water lake, where residual P from unmetabolized particles is deposited and is usually buried in the bottom sediments of deeper lakes, where P losses are greater. Notably, under biogeochemical reduction conditions, a small percentage of N is lost to the atmosphere in gaseous form (e.g., N2) through denitrification; thus the N losses are small. Although N and P share a common source in the Huating Lake basin, P content limits water eutrophication because of its larger loss than N.
The assessment of the phytoplankton ecosystem diversity, richness, and evenness of the Huating Lake basin using the algae cell density, H′, H, D, 1/D, and J parameters for assessing aquatic ecosystem health and stability (Fig.3(a)–3(f)) (Li et al., 2022) revealed the highest H′ value at LS-2 in March 2023 (2.30), the lowest value at LS-3 in March 2023 (0.48), and the mean value for the entire lake (1.5). In the inflow rivers, the highest and lowest values occurred at RS-2 in March 2023 (2.88) and at RS-2 in September 2022 (1.1), respectively; the mean value was 1.8 (Fig.3(b)). The Huating Lake basin is mesotrophic. These results suggest that the study basin exhibits spatiotemporal variations in species diversity in water bodies. The J and H parameters exhibited similar trends (Fig.3(c) and Fig.3(f)). The other parameters (D, 1/D, and algae cell density) are presented in Fig.3(a), Fig.3(d), and Fig.3(e), respectively. Tables S10 and S11 show the mean, maximum, and minimum values. Comprehensive analysis of the above results demonstrates that many factors, including natural factors and human activities, affect the health status of aquatic ecosystems. Among these factors, human activities have a particular impact on water bodies and aquatic ecosystems (Xu and Su, 2019; Brown et al., 2021; Chi et al., 2021).
Eutrophication of water bodies is typically accompanied by rapid growth of phytoplankton, and MC is an important indicator of the extent of phytoplankton damage to water quality (Li et al., 2022). The maximum MC value in the lake occurred at LS-2 in March 2023 (641.45 ng/L), and the minimum MC value occurred at LS-3 in March 2023 (64.75 ng/L); the mean value for the entire lake was 310.36 ng/L. The maximum MC value for the inflow rivers occurred at RS-3 in September 2022 (466.22 ng/L), and the minimum MC value for the inflow rivers occurred at RS-3 in March 2023 (82.55 ng/L); the mean MC value for the inflow rivers was 247.59 ng/L (Fig.2(d)). Note that the minimum MC value remained high for the inflow rivers despite the high mean and maximum MC values for the lake. The mean, maximum, and minimum values of the other parameters (T, pH, v, SD, EC, ORP, NH4–N, AP, CODMn, and DO) are presented in Tables S10 and S11, and Figs. S2 and S3.

3.2 MIC screening input parameters

Correlations between multiple eutrophication parameters, excluding the top three dominant species (1st, 2nd, and 3rd), were analyzed using MIC analysis (Fig.4). Identifying the parameters associated with nutritional limitations is crucial. Pearson’s correlation analysis (Fig. S4) revealed a strong correlation between TN and TP (r > 0.46, p < 0.05) and a significant correlation between TN/TP and TN or TP (r > 0.6, p < 0.05). These findings suggest the presence of shared sources of TN and TP inputs in the lake, such as agricultural pollution and urban wastewater (Yang et al., 2017; Hua et al., 2019; Li et al., 2023). Additionally, we observed negative correlations between TN and TP and parameters such as T, pH, EC, and ORP (r > 0.5, p < 0.05). Furthermore, an evaluation of the correlation between the limiting nutrients (e.g., NH4–N and AP) and other parameters (T, pH, v, SD, EC, and ORP) revealed a significant correlation between T and limiting nutrients (r > 0.52, p < 0.05). Although a significant positive correlation was observed between pH and N limiting nutrients (r > 0.5, p < 0.05), a significant negative correlation was observed between pH P (r > 0.5, p < 0.05). These results suggest that parameters such as T and pH significantly affect the presence of limiting nutrients in lakes (Jin et al., 2010; Zhang et al., 2010; Horppila, 2019).
Fig.4 MIC values between parameters.

Full size|PPT slide

The MIC index and Pearson correlation revealed a positive correlation between eutrophication parameters (such as TN and TP) and phytoplankton parameters. H′ was positively correlated with TN, T, and pH (p < 0.05; Fig.4), indicating that an increase in the eutrophication level of the water body contributes to phytoplankton population growth (Zhang et al., 2010; Hu et al., 2021). Furthermore, a correlation was observed between MC and the degree of eutrophication (r > 0.4, p < 0.05). Microcystin is a toxin secreted by some cyanobacteria, specifically those of the genus Microcystis and it represents one of the major toxic byproducts associated with water body eutrophication (Fuente et al., 2019). Water bodies enriched with nutrients, such as N and P, promoted the growth and reproduction of Microcystis, thereby increasing the MC production and accumulation (Burdick et al., 2020; Xu et al., 2021). A correlation (r > 0.58, p < 0.05) was observed between MC and parameters such as T, pH, EC, and ORP, reaching 0.9 for T. This result may be due to the increase in water T, which promotes the growth and reproduction of organisms (Ye et al., 2014). Because too many or few input parameters could increase the probability of covariance, we streamlined the input and target parameters for different predictions by combining the MIC and Pearson correlation for the subsequent training and validation processes (Table S12).

3.3 Performance evaluation of ML models

The ML model input parameters were screened using the MIC values to compare their prediction performance (Fig.5(a–f), S5(a–i), and S6(a–i)). The resultant representation of TN/TP showed that the other four models outperformed the DT model (Tables S13 and S14). This result is primarily due to two factors (Kim, 2016). First, the decision boundaries in DT are not smooth because only a single variable is considered at each node and splitting is performed at specific points. Furthermore, DT provides limited class probability estimates and assigns the same score to all instances within a terminal node, making DT sensitive to minor changes in input features, which can lead to significant modifications in the resulting tree structure. This instability was corroborated by Li et al. (2022), who found that LightGBM and XGBoost outperformed DT in predicting microbial fecal pollution in beach water based on environmental parameters.
Fig.5 Plot of percentage evaluation metrics after 500 runs of five ML models, with R2, RMSE, and MAPE metrics predicted for each target parameter (H′, J, D, H, TN/TP, and MC): (a) H′; (b) J; (c) D; (d) H; (e) TN/TP; (f) MC.

Full size|PPT slide

The XGBoost model performed best and exhibited high prediction accuracy and stability (R2 = 0.88; RMSE = 24.60; MAPE = 26.14%; MSE = 605.01; MAE = 10.98) (Fig.5(e), S5(d–f)). The XGBoost model can correct previously incorrect results; thus it is suitable for complex and characteristic data sets (Liu et al., 2018). In addition, the XGBoost model can rank feature parameters and save them as a block structure for subsequent iterations, thereby ensuring efficient functionality (Dhaliwal et al., 2018). Therefore, we selected the XGBoost model for further feature ranking and found that it works best in fitting complex, dynamic, and nonlinear relationships in TN/TP prediction. An investigation into a new water quality index (WQI) model for calculating Cork Harbour’s coastal WQI revealed that the XGBoost model exhibited the smallest discrepancy between predicted and actual WQI values. The predictive accuracy of this model surpassed that of other models, including DT, RF, and Extra Trees (ExT), in estimating WQI at each monitoring site (Uddin et al., 2022a).
In terms of the target parameters H′, H, D, J, and MC (Fig.5(a)–Fig.5(d) and Fig.5(f), respectively), by combining the prediction results and stability of the test set, we found that the LightGBM model exhibited better prediction performance and stability than the other four models (R2 = 0.92; RMSE = 0.11; MAPE = 4.95%; MSE = 0.01; MAE = 0.08) (Figs. S5(a–c), S5(g–i), and S6). LightGBM is a gradient-boosting tree algorithm with a faster speed and higher memory efficiency than conventional models, and it can effectively optimize data with a category imbalance (Dhaliwal et al., 2018). Therefore, we chose the LightGBM model for further feature ranking and demonstrated that it works best for complex, dynamic, and nonlinear relationships in phytoplankton parameter predictions.
Validation of the model prediction for untrained data showed that the XGBoost model for TN/TP achieved an R2 value of 0.86, RMSE value of 26.83, MAPE value of 29.42%, MSE value of 719.85, and MAE value of 12.03, whereas the LightGBM model for phytoplankton-associated parameters, excluding H and MC, achieved an R2 value of 0.87–0.9. These results confirm the effectiveness of the subsequent feature value analysis derived from the trained models.

3.4 Identification and prediction of lake eutrophication

In this study, the XGBoost model was used to make timely TN/TP predictions to identify nutrient-limited nutrients in lakes, because the XGBoost model excels at handling nonlinear relationships data with highly accurate predictions (Text S1) and outperforms the other models in TN/TP prediction (Fig.5). Subsequently, the LightGBM model was employed to predict the MC in water and assess the severity of eutrophication hazards in cyanobacteria-dominated lakes (Dong et al., 2016; Yuan and Pollard, 2017). Finally, the LightGBM model was used to predict the phytoplankton parameter H′. Integrating the threshold division of the phytoplankton parameter H′ (Table S4) in the prediction process yielded accurate predictions of dynamic changes in water eutrophication (Tian et al., 2021). The feature weights and ranking of each target parameter (H′, J, D, H, TN/TP, and MC) are analyzed below based on the XGBoost and LightGBM outputs (Fig.6(a)–Fig.6(f)).
Fig.6 Feature weights and rankings based on XGBoost and LightGBM outputs for each target parameter (H′, J, D, H, TN/TP, and MC) prediction: (a) H′, (b) J, (c) D, (d) H, (e) TN/TP, and (f) MC.

Full size|PPT slide

3.4.1 Identification of lake eutrophication

The XGBoost-based evaluation of the weight (indicating the degree of influence) of each feature in TN/TP prediction showed that physicochemical parameters significantly contributed to the TN/TP prediction of the model, in descending order of TP (0.907), TN (0.027), T, pH, and MC (Fig.6(e)). TP contributed the most to the TN/TP prediction of the XGBoost model, and the Huating Lake basin may have been P-limited (Fig. S7(a)). This result is consistent with the previous TN/TP results, which determined eutrophication in the Huating Lake basin to be P-limited (Tables S10 and S11). A previous study implemented ML models, including XGBoost, to predict TN and TP removal efficiencies in wastewater treatment plants (Wang et al., 2021). In the study lake environment, we found that XGBoost-based prediction of the magnitude of TN and TP contributions to TN/TP can reveal eutrophication-limiting nutrients in lakes, providing a new method for identifying such nutrients in lakes. A recent study to determine lake trophic indices also demonstrated that TN/TP plays an important role in determining lake trophic levels (Zhang et al., 2023). In addition, a previous study investigated the effect of lake depth on trophic levels and found that change in TN/TP influenced trophic levels and the two parameters were closely related (Qin et al., 2020).
The LightGBM-based evaluation of the weight (indicating the degree of influence) of each feature in MC prediction revealed feature importance in the descending order of DO (0.17), TN/TP (0.15), 3rd (0.13), and T (0.13), with the sum surpassing 50% (Fig.6(f)). In this regard, higher DO levels help limit algal growth and toxin production, whereas lower DO levels stimulate these processes (Tian et al., 2021). A high TN/TP value may also promote algal growth and toxin production (Qin et al., 2020). The average DO and TN/TP values in Huating Lake were maintained at 9.7 and 68.2 mg/L, respectively (Table S10), which are conducive to phytoplankton blooms. Therefore, they made the greatest contribution to MC. However, 3rd and T were related to the number and species of algae. The phyla cyanobacteria and green algae predominated in Huating Lake (Figs. S8 and S9). Because a higher phytoplankton number and T may stimulate algal growth, accelerate phytotoxin production, and increase phytotoxin concentrations in water bodies (Behrenfeld et al., 2021), they may contribute to MC to a certain degree. In lakes with rapid phytoplankton blooms dominated by the phyla cyanobacteria and green algae owing to eutrophication, changes in MC can be used to determine whether water bodies are at risk of eutrophication.

3.4.2 Prediction of lake eutrophication

The LightGBM-based evaluation of the weights of each feature in the predicted target parameters showed that phytoplankton parameters (H′, H, J, and D) had a greater influence on each other (Fig.6(a)–Fig.6(d) and Fig.6(f)). The survival and reproduction of phytoplankton may be interconnected through the influence of other organisms, DO, light, and many other factors that share a common source of influence. Therefore, phytoplankton parameters exert a greater influence on each other, as substantiated by MIC analysis (Litchman and Klausmeier, 2008), indicating a stronger correlation among these parameters (Fig.4).
Excluding the effects of phytoplankton parameters on each other from the analysis revealed that physicochemical parameters (TN, TP, and T, etc.) remain key parameters affecting phytoplankton growth and community structures (Rao et al., 2018; Behrenfeld et al., 2021). The direct and indirect effects of physicochemical parameters (e.g., TN, TP, and T) on phytoplankton growth and ecological community evolution, affect water body eutrophication. TN was associated with all four parameters when the model predicted H′, H, J, and D. This indicated the feasibility of using phytoplankton parameters to predict the eutrophication status of lakes. Meng et al. (2020) investigated the response of phytoplankton α-diversity indices to changes in the trophic status of aquatic habitats in the Harbin section of the Songhua River and found that all these parameters could be analyzed to reveal their dynamics and relationships with trophic status, providing evidence for our conclusions. In addition, many studies on the relationship between lake nutrient levels and phytoplankton have demonstrated that phytoplankton can effectively serve as indicators that describe lake nutrient levels under most conditions (Shan et al., 2019; Uddin et al., 2024a). Our study is based on an ML model that specifies and clarifies the relationship between phytoplankton indicators and lake nutrient levels.
Further evaluation to determine the most suitable parameter for predicting dynamic changes in water body eutrophication showed the superiority of H′ over H, J, and D (Fig. S6), indicating its potential as an eutrophication predictor. For example, Zhang and Zang (2015) successfully determined that the water quality of Zalong Wetland was mesotrophic using H′ based on data on the water environment and phytoplankton community structure monitored in the Zalong Wetland in North-east China. Jia et al. (2019) used the phytoplankton community and nutrient status of Poyang Lake in the lower reaches of the Ganjiang River to determine the response to nutrient status using H. This study revealed that the ML models facilitated the timely prediction of H′. Combined with the identified thresholds (Table S4), our results reveal the middle nutritional status of the Huating Lake basin (Fig. S7(b)).

3.5 Implications and limitations

This study, focused on Huating Lake, exemplifies the utility of ML models in analyzing nutrient limitations and variations in lakes influenced by diverse pollution sources such as livestock breeding, agriculture, domestic activities, and industrial operations. By leveraging data on physicochemical water quality parameters, meteorological conditions, and phytoplankton, this study highlights the effectiveness of the XGBoost model in predicting N/TP ratios and the LightGBM model in determining H′. It recommends prioritizing these models in future investigations into lake nutrient dynamics to address the challenges posed by ecological and environmental variability.
The primary objective of this study was to forecast nutritional shifts and identify limiting nutrients in Huating Lake. Given its effectiveness in pinpointing lake eutrophication, our approach can be extended to other pollutants and lakes (Wu et al., 2017; Yu et al., 2018; Hu et al., 2023). However, some limitations must be acknowledged. This study predominantly considers pollution from industrial, agricultural, and domestic sources, making it applicable to lakes influenced by human activities. The applicability of our findings to lakes affected by different pollution sources, such as organic compounds and antibiotics is yet to be determined, necessitating further investigation into lakes with diverse pollution profiles to confirm the generalizability of the models. Moreover, insights from the water quality assessments in the Huating Lake basin suggest that local authorities should improve the monitoring of key eutrophication contributors, especially P, because it is a critical limiting nutrient. Stringent controls on P emissions from agricultural, industrial, and urban drainage sources are critical for mitigating external P influx and lake eutrophication. The success of ML models is inherently linked to the quality and completeness of the data set (Uddin et al., 2022b). The scope of this study is limited by its geographical area and temporal coverage. Future research should encompass a wider spatiotemporal range to validate our results.

4 Conclusions

The water quality of the Huating Lake basin exhibits notable spatiotemporal variability, with environmental factors and human activities influencing eutrophication. Analysis of the TN/TP ratio revealed that P is a primary limiting factor contributing to eutrophication, probably due to the low-flow and deep-water characteristics of the lake. To further predict and identify eutrophication in the lake, five tree-based ML models were trained using specific environmental factors, including TN/TP and phytoplankton parameters.
● The XGBoost model demonstrated significant predictive performance in estimating TN/TP values (R2 = 0.88) compared with the other models, because the XGBoost model excels at handling nonlinear data with high prediction accuracy. This result highlights the significant role of P as the primary pollutant contributing to eutrophication in the Huating Lake basin.
● The LightGBM model exhibited superior performance in predicting phytoplankton parameters, excluding H and MC. In particular, high prediction accuracy was obtained for H′ (R2 = 0.92; RMSE = 0.11; MAPE = 4.95%). The combination of H' thresholds further indicated that Huating Lake is mesotrophic, aligning with previous assessments based on nutrient elements.
● Incorporating TN/TP and H′ parameters into the ML models demonstrated their potential in identifying and predicting limiting nutrients in lake eutrophication and characterizing changes in water quality using phytoplankton parameters. Future studies should cover a wider range of pollution sources and spatiotemporal dimensions to further validate our findings.
This study highlights the immense potential of incorporating TN/TP and phytoplankton parameters into ML models for environmental monitoring and management.

References

[1]
Behrenfeld M J, Boss E S, Halsey K H. (2021). Phytoplankton community structuring and succession in a competition-neutral resource landscape. ISME Communications, 1(1): 12
CrossRef Google scholar
[2]
Brown K P, Gerber A, Bedulina D, Timofeyev M A. (2021). Human impact and ecosystemic health at Lake Baikal. WIREs. Water, 8(4): e1528
CrossRef Google scholar
[3]
Burdick S M, Hewitt D A, Martin B A, Schenk L, Rounds S A. (2020). Effects of harmful algal blooms and associated water-quality on endangered Lost River and shortnose suckers. Harmful Algae, 97: 101847
CrossRef Google scholar
[4]
Carrasco Navas-Parejo J C, Corzo A, Papaspyrou S. (2020). Seasonal cycles of phytoplankton biomass and primary production in a tropical temporarily open-closed estuarine lagoon: the effect of an extreme climatic event. Science of the Total Environment, 723: 138014
CrossRef Google scholar
[5]
Chi Y, Liu D, Xing W, Wang J. (2021). Island ecosystem health in the context of human activities with different types and intensities. Journal of Cleaner Production, 281: 125334
CrossRef Google scholar
[6]
Conley D J, Paerl H W, Howarth R W, Boesch D F, Seitzinger S P, Havens K E, Lancelot C, Likens G E. (2009). Controlling eutrophication: nitrogen and phosphorus. Science, 323(5917): 1014–1015
CrossRef Google scholar
[7]
Derot J, Jamoneau A, Teichert N, Rosebery J, Morin S, Laplace-Treyture C. (2020). Response of phytoplankton traits to environmental variables in French lakes: new perspectives for bioindication. Ecological Indicators, 108: 105659
CrossRef Google scholar
[8]
Dhaliwal S, Nahid A, Abbas R. (2018). Effective intrusion detection system using XGBoost. Information, 9(7): 149
CrossRef Google scholar
[9]
Ding F, Zhang W, Cao S, Hao S, Chen L, Xie X, Li W, Jiang M. (2023). Optimization of water quality index models using machine learning approaches. Water Research, 243: 120337
CrossRef Google scholar
[10]
Dong X, Zeng S, Bai F, Li D, He M. (2016). Extracellular microcystin prediction based on toxigenic Microcystis detection in a eutrophic lake. Scientific Reports, 6(1): 20886
CrossRef Google scholar
[11]
FengC (2007). Studies on the agricultural ecological tour development in Huating Lake scenic spot. Anhui Nongye Kexue, 35(7): 2035–2037 (in Chinese)
[12]
Feng L, Dai Y, Hou X, Xu Y, Liu J, Zheng C. (2021). Concerns about phytoplankton bloom trends in global lakes. Nature, 590(7846): E35–E47
CrossRef Google scholar
[13]
Fortes A C C, Barrocas P R G, Kligerman D C. (2023). Water quality indices: construction, potential, and limitations. Ecological Indicators, 157: 111187
CrossRef Google scholar
[14]
Fuente A D L, Muro-Pastor A M, Merchán F, Madrid F, Pérez-Martínez J I, Undabeytia T. (2019). Electrocoagulation/flocculation of cyanobacteria from surface waters. Journal of Cleaner Production, 238: 117964
CrossRef Google scholar
[15]
Ge F, Ma Z, Chen B, Wang Y, Lu X, An S, Zhang D, Zhang W, Yu W, Han W. . (2022). Phytoplankton species diversity patterns and associated driving factors in China’s Jiulong River estuary: roles that nutrients and nutrient ratios play. Frontiers in Marine Science, 9: 829285
CrossRef Google scholar
[16]
Georgescu P L, Moldovanu S, Iticescu C, Calmuc M, Calmuc V, Topa C, Moraru L. (2023). Assessing and forecasting water quality in the Danube River by using neural network approaches. Science of the Total Environment, 879: 162998
CrossRef Google scholar
[17]
Horppila J. (2019). Sediment nutrients, ecological status and restoration of lakes. Water Research, 160: 206–208
CrossRef Google scholar
[18]
HowarthR W, Marino R (2006). Nitrogen as the limiting nutrient for eutrophication in coastal marine ecosystems: Evolving views over three decades. Limnology and Oceanography, 51(1111): 364–376
[19]
Hu L, Shan K, Huang L, Li Y, Zhao L, Zhou Q, Song L. (2021). Environmental factors associated with cyanobacterial assemblages in a mesotrophic subtropical plateau lake: a focus on bloom toxicity. Science of the Total Environment, 777: 146052
CrossRef Google scholar
[20]
Hu Y, Du W, Yang C, Wang Y, Huang T, Xu X, Li W. (2023). Source identification and prediction of nitrogen and phosphorus pollution of Lake Taihu by an ensemble machine learning technique. Frontiers of Environmental Science & Engineering, 17(5): 55
CrossRef Google scholar
[21]
Hua L, Li W, Zhai L, Yen H, Lei Q, Liu H, Ren T, Xia Y, Zhang F, Fan X. (2019). An innovative approach to identifying agricultural pollution sources and loads by using nutrient export coefficients in watershed modeling. Journal of Hydrology, 571: 322–331
CrossRef Google scholar
[22]
Jenkins S H. (1982). Standard methods for the examination of water and wastewater. Water Research, 16(10): 1495–1496
CrossRef Google scholar
[23]
Jia J, Gao Y, Song X, Chen S. (2019). Characteristics of phytoplankton community and water net primary productivity response to the nutrient status of the Poyang Lake and Gan River, China. Ecohydrology, 12(7): e2136
CrossRef Google scholar
[24]
Jiang M, Nakano S I. (2022). The crucial influence of trophic status on the relative requirement of nitrogen to phosphorus for phytoplankton growth. Water Research, 222: 118868
CrossRef Google scholar
[25]
Jin M, Ren Z, Shi J P, Huang X Z, Chen J R. (2010). Impact of agricultural non-point source pollution in eutrophic water body of Taihu Lake. Environmental Science & Technology, 33(10): 106–111
[26]
Kim K. (2016). A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Pattern Recognition, 60: 157–163
CrossRef Google scholar
[27]
Li N, Wang J, Yin W, Jia H, Xu J, Hao R, Zhong Z, Shi Z. (2021). Linking water environmental factors and the local watershed landscape to the chlorophyll a concentration in reservoir bays. Science of the Total Environment, 758: 143617
CrossRef Google scholar
[28]
Li S, Liu C, Sun P, Ni T. (2022). Response of cyanobacterial bloom risk to nitrogen and phosphorus concentrations in large shallow lakes determined through geographical detector: a case study of Taihu Lake, China. Science of the Total Environment, 816: 151617
CrossRef Google scholar
[29]
Li X, Xu W, Song S, Sun J. (2023). Sources and spatiotemporal distribution characteristics of nitrogen and phosphorus loads in the Haihe River Basin, China. Marine Pollution Bulletin, 189: 114756
CrossRef Google scholar
[30]
Litchman E, Klausmeier C A. (2008). Trait-based community ecology of phytoplankton. Annual Review of Ecology, Evolution, and Systematics, 39(1): 615–639
CrossRef Google scholar
[31]
LiuY, LuoH, ZhaoB, Zhao X, HanZ (2018). Short-Term Power Load Forecasting Based on Clustering and XGBoost Method. New York: Institute of Electrical and Electronics Engineers
[32]
Liu Y, Zhuang Y, Ji B, Zhang G, Rong L, Teng G, Wang C. (2022). Prediction of laying hen house odor concentrations using machine learning models based on small sample data. Computers and Electronics in Agriculture, 195: 106849
CrossRef Google scholar
[33]
Meng F, Li Z, Li L, Lu F, Liu Y, Lu X, Fan Y. (2020). Phytoplankton alpha diversity indices response the trophic state variation in hydrologically connected aquatic habitats in the Harbin Section of the Songhua River. Scientific Reports, 10(1): 21337
CrossRef Google scholar
[34]
Muhid P, Davis T W, Bunn S E, Burford M A. (2013). Effects of inorganic nutrients in recycled water on freshwater phytoplankton biomass and composition. Water Research, 47(1): 384–394
CrossRef Google scholar
[35]
Qin B, Zhou J, Elser J J, Gardner W S, Deng J, Brookes J D. (2020). Water depth underpins the relative roles and fates of nitrogen and phosphorus in lakes. Environmental Science & Technology, 54(6): 3191–3198
CrossRef Google scholar
[36]
RaoK, ZhangX, YiX, LiZ, WangP, Huang G, GuoX (2018). Interactive effects of environmental factors on phytoplankton communities and benthic nutrient interactions in a shallow lake and adjoining rivers in China. Science of the Total Environment, 619–620: 1661–1672
[37]
Reddy G T, Reddy M P K, Lakshmanna K, Kaluri R, Rajput D S, Srivastava G, Baker T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access: Practical Innovations, Open Solutions, 8: 54776–54788
CrossRef Google scholar
[38]
Rezaie-Balf M, Attar N F, Mohammadzadeh A, Murti M A, Ahmed A N, Fai C M, Nabipour N, Alaghmand S, El-Shafie A. (2020). Physicochemical parameters data assimilation for efficient improvement of water quality index prediction: comparative assessment of a noise suppression hybridization approach. Journal of Cleaner Production, 271: 122576
CrossRef Google scholar
[39]
Shan K, Song L, Chen W, Li L, Liu L, Wu Y, Jia Y, Zhou Q, Peng L. (2019). Analysis of environmental drivers influencing interspecific variations and associations among bloom-forming cyanobacteria in large, shallow eutrophic lakes. Harmful Algae, 84: 84–94
CrossRef Google scholar
[40]
Singh K P, Malik A, Sinha S. (2005). Water quality assessment and apportionment of pollution sources of Gomti River (India) using multivariate statistical techniques: a case study. Analytica Chimica Acta, 538(1−2): 355–374
CrossRef Google scholar
[41]
Tian Y, Jiang Y, Liu Q, Xu D, Liu Y, Song J. (2021). The impacts of local and regional factors on the phytoplankton community dynamics in a temperate river, northern China. Ecological Indicators, 123: 107352
CrossRef Google scholar
[42]
Uddin M G, Nash S, Mahammad Diganta M T, Rahman A, Olbert A I. (2022a). Robust machine learning algorithms for predicting coastal water quality index. Journal of Environmental Management, 321(8): 115923
CrossRef Google scholar
[43]
Uddin M G, Nash S, Rahman A, Dabrowski T, Olbert A I. (2024a). Data-driven modelling for assessing trophic status in marine ecosystems using machine learning approaches. Environmental Research, 242: 117755
CrossRef Google scholar
[44]
Uddin M G, Nash S, Rahman A, Olbert A I. (2022b). A comprehensive method for improvement of water quality index (WQI) models for coastal water quality assessment. Water Research, 219: 118532
CrossRef Google scholar
[45]
Uddin M G, Nash S, Rahman A, Olbert A I. (2023a). A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches. Water Research, 229: 119422
CrossRef Google scholar
[46]
Uddin M G, Nash S, Rahman A, Olbert A I. (2023b). A sophisticated model for rating water quality. Science of the Total Environment, 868: 161614
CrossRef Google scholar
[47]
Uddin M G, Rahman A, Rosa Taghikhah F, Olbert A I. (2024b). Data-driven evolution of water quality models: an in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model. Water Research, 255: 121499
CrossRef Google scholar
[48]
Wang X, Fu D, Wang Y, Guo Y, Ding Y. (2021). The XGBoost and the SVM-based prediction models for bioretention cell decontamination effect. Arabian Journal of Geosciences, 14(8): 669
CrossRef Google scholar
[49]
Wu Z, Liu Y, Liang Z, Wu S, Guo H. (2017). Internal cycling, not external loading, decides the nutrient limitation in eutrophic lake: a dynamic model with temporal Bayesian hierarchical inference. Water Research, 116: 231–240
CrossRef Google scholar
[50]
XiongJ, Lin C, CaoZ, HuM, XueK, ChenX, Ma R (2022). Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: conventional or machine learning? Water Research, 215(1): 118213
[51]
Xiong J, Lin C, Ma R, Cao Z. (2019). Remote sensing estimation of lake total phosphorus concentration based on MODIS: a case study of Lake Hongze. Remote Sensing, 11(17): 2068
CrossRef Google scholar
[52]
Xu W, Li X, Li Y, Sun Y, Zhang L, Huang Y, Yang Z. (2021). Rising temperature more strongly promotes low-abundance Paramecium to remove Microcystis and degrade Microcystins. Environmental Pollution, 291: 118143
CrossRef Google scholar
[53]
Xu W, Su X. (2019). Challenges and impacts of climate change and human activities on groundwater-dependent ecosystems in arid areas: a case study of the Nalenggele alluvial fan in NW China. Journal of Hydrology, 573: 376–385
CrossRef Google scholar
[54]
Yang Y, Gao B, Hao H, Zhou H, Lu J. (2017). Nitrogen and phosphorus in sediments in China: a national-scale assessment and review. Science of the Total Environment, 576: 840–849
CrossRef Google scholar
[55]
Ye R, Shan K, Gao H, Zhang R, Xiong W, Wang Y, Qian X. (2014). Spatio-temporal distribution patterns in environmental factors, chlorophyll-a and microcystins in a large shallow lake, Lake Taihu, China. International Journal of Environmental Research and Public Health, 11(5): 5155–5169
CrossRef Google scholar
[56]
Yu H, Jiang S, Land K C. (2015). Multicollinearity in hierarchical linear models. Social Science Research, 53: 118–136
CrossRef Google scholar
[57]
Yu Q, Wang F, Yan W, Zhang F, Lv S, Li Y. (2018). Carbon and nitrogen burial and response to climate change and anthropogenic disturbance in Chaohu Lake, China. International Journal of Environmental Research and Public Health, 15(12): 2734
CrossRef Google scholar
[58]
Yuan L L, Pollard A I. (2017). Using national-scale data to develop nutrient–microcystin relationships that guide management decisions. Environmental Science & Technology, 51(12): 6972–6980
CrossRef Google scholar
[59]
Zhang F, Xue B, Cai Y, Xu H, Zou W. (2023). Utility of trophic state index in lakes and reservoirs in the Chinese eastern plains ecoregion: the key role of water depth. Ecological Indicators, 148: 110029
CrossRef Google scholar
[60]
Zhang J, Fu P, Meng F, Yang X, Xu J, Cui Y. (2022). Estimation algorithm for chlorophyll-a concentrations in water from hyperspectral images based on feature derivation and ensemble learning. Ecological Informatics, 71: 101783
CrossRef Google scholar
[61]
Zhang M, Leyi N, Cao T, Fang T, Xiong D W, Zhou G J, Zhu G R, Jun X U, Guo L G. (2010). Impact of aquatic environmental factors on distribution pattern of aquatic macrophytes in upper reaches of Taihu Lake watershed. Environmental Science & Technology, 33(3): 171–174
[62]
Zhang N, Zang S. (2015). Characteristics of phytoplankton distribution for assessment of water quality in the Zhalong Wetland, China. International Journal of Environmental Science and Technology, 12(11): 3657–3664
CrossRef Google scholar
[63]
Znachor P, Nedoma J, Hejzlar J, Seďa J, Komárková J, Kolář V, Mrkvička T, Boukal D S. (2020). Changing environmental conditions underpin long-term patterns of phytoplankton in a freshwater reservoir. Science of the Total Environment, 710: 135626
CrossRef Google scholar

Acknowledgements

The authors acknowledge the National Natural Science Foundation of China (Nos. 51278001 and U22A20401), and the Anhui Province Major Science and Technology Projects (China) (No. 202003a0702014) for supporting this work. We thank Letpub for its linguistic assistance during the preparation of this manuscript.

Conflict of Interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Electronic Supplementary Material

Supplementary material is available in the online version of this article at https://doi.org/10.1007/s11783-025-1934-6 and is accessible for authorized users.

RIGHTS & PERMISSIONS

2025 Higher Education Press 2025
AI Summary AI Mindmap
PDF(6551 KB)

Supplementary files

FSE-24107-of-FY_suppl_1 (2725 KB)

1015

Accesses

0

Citations

2

Altmetric

Detail

Sections
Recommended

/