Utilizing machine learning models to grasp water quality dynamic changes in lake eutrophication through phytoplankton parameters

Yong Fang, Ruting Huang, Yeyin Zhang, Jun Zhang, Wenni Xi, Xianyang Shi

PDF(6551 KB)
PDF(6551 KB)
Front. Environ. Sci. Eng. ›› 2025, Vol. 19 ›› Issue (2) : 14. DOI: 10.1007/s11783-025-1934-6
RESEARCH ARTICLE

Utilizing machine learning models to grasp water quality dynamic changes in lake eutrophication through phytoplankton parameters

Author information +
History +

Highlights

● Accurate identification of lake eutrophication was achieved via ML models.

● XGBoost model has superior performance in identifying limiting nutrients.

● LightGBM model effectively uses phytoplankton for water quality characterization.

● ML model with TN/TP ratio and phytoplankton can track lake eutrophication dynamics.

Abstract

Phytoplankton serve as vital indicators of eutrophication levels. However, relying solely on phytoplankton parameters, such as chlorophyll-a, limits our comprehensive understanding of the intricate eutrophication conditions in natural lakes, particularly in terms of timely analysis of changes in limiting nutrients and their concentrations. This study presents machine learning (ML) models for predicting and identifying lake eutrophication. Five tree-based ML models were developed using the latest data on hydrological, water quality, and meteorological parameters obtained from 34 sites in the Huating Lake basin over 5 months. The extreme gradient boosting model exhibited high accuracy in predicting the total nitrogen/total phosphorus ratio (TN/TP) (R2 = 0.88; RMSE = 24.60; MAPE = 26.14%). Analysis of the TN/TP ratio and output eigenvalue weight revealed that phosphorus plays a crucial role in eutrophication, probably because of the low-flow and deep-water characteristics of the basin. Furthermore, the light gradient boosting machine model exhibited outstanding performance and high accuracy in predicting phytoplankton parameters, especially the Shannon index (H′) (R2 = 0.92; RMSE = 0.11; MAPE = 4.95%). The mesotrophic classification of the Huating Lake determined using the H′ threshold, coincided with the findings from the H′ analysis. Future research should cover a wider range of pollution sources and spatiotemporal dimensions to further validate our findings. Overall, this study highlights the potential of incorporating the TN/TP ratio and phytoplankton parameters into ML techniques for effective monitoring and management of environmental conditions.

Graphical abstract

Keywords

Machine learning / Lake / Phytoplankton / Water quality

Cite this article

Download citation ▾
Yong Fang, Ruting Huang, Yeyin Zhang, Jun Zhang, Wenni Xi, Xianyang Shi. Utilizing machine learning models to grasp water quality dynamic changes in lake eutrophication through phytoplankton parameters. Front. Environ. Sci. Eng., 2025, 19(2): 14 https://doi.org/10.1007/s11783-025-1934-6

References

[1]
Behrenfeld M J, Boss E S, Halsey K H. (2021). Phytoplankton community structuring and succession in a competition-neutral resource landscape. ISME Communications, 1(1): 12
CrossRef Google scholar
[2]
Brown K P, Gerber A, Bedulina D, Timofeyev M A. (2021). Human impact and ecosystemic health at Lake Baikal. WIREs. Water, 8(4): e1528
CrossRef Google scholar
[3]
Burdick S M, Hewitt D A, Martin B A, Schenk L, Rounds S A. (2020). Effects of harmful algal blooms and associated water-quality on endangered Lost River and shortnose suckers. Harmful Algae, 97: 101847
CrossRef Google scholar
[4]
Carrasco Navas-Parejo J C, Corzo A, Papaspyrou S. (2020). Seasonal cycles of phytoplankton biomass and primary production in a tropical temporarily open-closed estuarine lagoon: the effect of an extreme climatic event. Science of the Total Environment, 723: 138014
CrossRef Google scholar
[5]
Chi Y, Liu D, Xing W, Wang J. (2021). Island ecosystem health in the context of human activities with different types and intensities. Journal of Cleaner Production, 281: 125334
CrossRef Google scholar
[6]
Conley D J, Paerl H W, Howarth R W, Boesch D F, Seitzinger S P, Havens K E, Lancelot C, Likens G E. (2009). Controlling eutrophication: nitrogen and phosphorus. Science, 323(5917): 1014–1015
CrossRef Google scholar
[7]
Derot J, Jamoneau A, Teichert N, Rosebery J, Morin S, Laplace-Treyture C. (2020). Response of phytoplankton traits to environmental variables in French lakes: new perspectives for bioindication. Ecological Indicators, 108: 105659
CrossRef Google scholar
[8]
Dhaliwal S, Nahid A, Abbas R. (2018). Effective intrusion detection system using XGBoost. Information, 9(7): 149
CrossRef Google scholar
[9]
Ding F, Zhang W, Cao S, Hao S, Chen L, Xie X, Li W, Jiang M. (2023). Optimization of water quality index models using machine learning approaches. Water Research, 243: 120337
CrossRef Google scholar
[10]
Dong X, Zeng S, Bai F, Li D, He M. (2016). Extracellular microcystin prediction based on toxigenic Microcystis detection in a eutrophic lake. Scientific Reports, 6(1): 20886
CrossRef Google scholar
[11]
FengC (2007). Studies on the agricultural ecological tour development in Huating Lake scenic spot. Anhui Nongye Kexue, 35(7): 2035–2037 (in Chinese)
[12]
Feng L, Dai Y, Hou X, Xu Y, Liu J, Zheng C. (2021). Concerns about phytoplankton bloom trends in global lakes. Nature, 590(7846): E35–E47
CrossRef Google scholar
[13]
Fortes A C C, Barrocas P R G, Kligerman D C. (2023). Water quality indices: construction, potential, and limitations. Ecological Indicators, 157: 111187
CrossRef Google scholar
[14]
Fuente A D L, Muro-Pastor A M, Merchán F, Madrid F, Pérez-Martínez J I, Undabeytia T. (2019). Electrocoagulation/flocculation of cyanobacteria from surface waters. Journal of Cleaner Production, 238: 117964
CrossRef Google scholar
[15]
Ge F, Ma Z, Chen B, Wang Y, Lu X, An S, Zhang D, Zhang W, Yu W, Han W. . (2022). Phytoplankton species diversity patterns and associated driving factors in China’s Jiulong River estuary: roles that nutrients and nutrient ratios play. Frontiers in Marine Science, 9: 829285
CrossRef Google scholar
[16]
Georgescu P L, Moldovanu S, Iticescu C, Calmuc M, Calmuc V, Topa C, Moraru L. (2023). Assessing and forecasting water quality in the Danube River by using neural network approaches. Science of the Total Environment, 879: 162998
CrossRef Google scholar
[17]
Horppila J. (2019). Sediment nutrients, ecological status and restoration of lakes. Water Research, 160: 206–208
CrossRef Google scholar
[18]
HowarthR W, Marino R (2006). Nitrogen as the limiting nutrient for eutrophication in coastal marine ecosystems: Evolving views over three decades. Limnology and Oceanography, 51(1111): 364–376
[19]
Hu L, Shan K, Huang L, Li Y, Zhao L, Zhou Q, Song L. (2021). Environmental factors associated with cyanobacterial assemblages in a mesotrophic subtropical plateau lake: a focus on bloom toxicity. Science of the Total Environment, 777: 146052
CrossRef Google scholar
[20]
Hu Y, Du W, Yang C, Wang Y, Huang T, Xu X, Li W. (2023). Source identification and prediction of nitrogen and phosphorus pollution of Lake Taihu by an ensemble machine learning technique. Frontiers of Environmental Science & Engineering, 17(5): 55
CrossRef Google scholar
[21]
Hua L, Li W, Zhai L, Yen H, Lei Q, Liu H, Ren T, Xia Y, Zhang F, Fan X. (2019). An innovative approach to identifying agricultural pollution sources and loads by using nutrient export coefficients in watershed modeling. Journal of Hydrology, 571: 322–331
CrossRef Google scholar
[22]
Jenkins S H. (1982). Standard methods for the examination of water and wastewater. Water Research, 16(10): 1495–1496
CrossRef Google scholar
[23]
Jia J, Gao Y, Song X, Chen S. (2019). Characteristics of phytoplankton community and water net primary productivity response to the nutrient status of the Poyang Lake and Gan River, China. Ecohydrology, 12(7): e2136
CrossRef Google scholar
[24]
Jiang M, Nakano S I. (2022). The crucial influence of trophic status on the relative requirement of nitrogen to phosphorus for phytoplankton growth. Water Research, 222: 118868
CrossRef Google scholar
[25]
Jin M, Ren Z, Shi J P, Huang X Z, Chen J R. (2010). Impact of agricultural non-point source pollution in eutrophic water body of Taihu Lake. Environmental Science & Technology, 33(10): 106–111
[26]
Kim K. (2016). A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Pattern Recognition, 60: 157–163
CrossRef Google scholar
[27]
Li N, Wang J, Yin W, Jia H, Xu J, Hao R, Zhong Z, Shi Z. (2021). Linking water environmental factors and the local watershed landscape to the chlorophyll a concentration in reservoir bays. Science of the Total Environment, 758: 143617
CrossRef Google scholar
[28]
Li S, Liu C, Sun P, Ni T. (2022). Response of cyanobacterial bloom risk to nitrogen and phosphorus concentrations in large shallow lakes determined through geographical detector: a case study of Taihu Lake, China. Science of the Total Environment, 816: 151617
CrossRef Google scholar
[29]
Li X, Xu W, Song S, Sun J. (2023). Sources and spatiotemporal distribution characteristics of nitrogen and phosphorus loads in the Haihe River Basin, China. Marine Pollution Bulletin, 189: 114756
CrossRef Google scholar
[30]
Litchman E, Klausmeier C A. (2008). Trait-based community ecology of phytoplankton. Annual Review of Ecology, Evolution, and Systematics, 39(1): 615–639
CrossRef Google scholar
[31]
LiuY, LuoH, ZhaoB, Zhao X, HanZ (2018). Short-Term Power Load Forecasting Based on Clustering and XGBoost Method. New York: Institute of Electrical and Electronics Engineers
[32]
Liu Y, Zhuang Y, Ji B, Zhang G, Rong L, Teng G, Wang C. (2022). Prediction of laying hen house odor concentrations using machine learning models based on small sample data. Computers and Electronics in Agriculture, 195: 106849
CrossRef Google scholar
[33]
Meng F, Li Z, Li L, Lu F, Liu Y, Lu X, Fan Y. (2020). Phytoplankton alpha diversity indices response the trophic state variation in hydrologically connected aquatic habitats in the Harbin Section of the Songhua River. Scientific Reports, 10(1): 21337
CrossRef Google scholar
[34]
Muhid P, Davis T W, Bunn S E, Burford M A. (2013). Effects of inorganic nutrients in recycled water on freshwater phytoplankton biomass and composition. Water Research, 47(1): 384–394
CrossRef Google scholar
[35]
Qin B, Zhou J, Elser J J, Gardner W S, Deng J, Brookes J D. (2020). Water depth underpins the relative roles and fates of nitrogen and phosphorus in lakes. Environmental Science & Technology, 54(6): 3191–3198
CrossRef Google scholar
[36]
RaoK, ZhangX, YiX, LiZ, WangP, Huang G, GuoX (2018). Interactive effects of environmental factors on phytoplankton communities and benthic nutrient interactions in a shallow lake and adjoining rivers in China. Science of the Total Environment, 619–620: 1661–1672
[37]
Reddy G T, Reddy M P K, Lakshmanna K, Kaluri R, Rajput D S, Srivastava G, Baker T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access: Practical Innovations, Open Solutions, 8: 54776–54788
CrossRef Google scholar
[38]
Rezaie-Balf M, Attar N F, Mohammadzadeh A, Murti M A, Ahmed A N, Fai C M, Nabipour N, Alaghmand S, El-Shafie A. (2020). Physicochemical parameters data assimilation for efficient improvement of water quality index prediction: comparative assessment of a noise suppression hybridization approach. Journal of Cleaner Production, 271: 122576
CrossRef Google scholar
[39]
Shan K, Song L, Chen W, Li L, Liu L, Wu Y, Jia Y, Zhou Q, Peng L. (2019). Analysis of environmental drivers influencing interspecific variations and associations among bloom-forming cyanobacteria in large, shallow eutrophic lakes. Harmful Algae, 84: 84–94
CrossRef Google scholar
[40]
Singh K P, Malik A, Sinha S. (2005). Water quality assessment and apportionment of pollution sources of Gomti River (India) using multivariate statistical techniques: a case study. Analytica Chimica Acta, 538(1−2): 355–374
CrossRef Google scholar
[41]
Tian Y, Jiang Y, Liu Q, Xu D, Liu Y, Song J. (2021). The impacts of local and regional factors on the phytoplankton community dynamics in a temperate river, northern China. Ecological Indicators, 123: 107352
CrossRef Google scholar
[42]
Uddin M G, Nash S, Mahammad Diganta M T, Rahman A, Olbert A I. (2022a). Robust machine learning algorithms for predicting coastal water quality index. Journal of Environmental Management, 321(8): 115923
CrossRef Google scholar
[43]
Uddin M G, Nash S, Rahman A, Dabrowski T, Olbert A I. (2024a). Data-driven modelling for assessing trophic status in marine ecosystems using machine learning approaches. Environmental Research, 242: 117755
CrossRef Google scholar
[44]
Uddin M G, Nash S, Rahman A, Olbert A I. (2022b). A comprehensive method for improvement of water quality index (WQI) models for coastal water quality assessment. Water Research, 219: 118532
CrossRef Google scholar
[45]
Uddin M G, Nash S, Rahman A, Olbert A I. (2023a). A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches. Water Research, 229: 119422
CrossRef Google scholar
[46]
Uddin M G, Nash S, Rahman A, Olbert A I. (2023b). A sophisticated model for rating water quality. Science of the Total Environment, 868: 161614
CrossRef Google scholar
[47]
Uddin M G, Rahman A, Rosa Taghikhah F, Olbert A I. (2024b). Data-driven evolution of water quality models: an in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model. Water Research, 255: 121499
CrossRef Google scholar
[48]
Wang X, Fu D, Wang Y, Guo Y, Ding Y. (2021). The XGBoost and the SVM-based prediction models for bioretention cell decontamination effect. Arabian Journal of Geosciences, 14(8): 669
CrossRef Google scholar
[49]
Wu Z, Liu Y, Liang Z, Wu S, Guo H. (2017). Internal cycling, not external loading, decides the nutrient limitation in eutrophic lake: a dynamic model with temporal Bayesian hierarchical inference. Water Research, 116: 231–240
CrossRef Google scholar
[50]
XiongJ, Lin C, CaoZ, HuM, XueK, ChenX, Ma R (2022). Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: conventional or machine learning? Water Research, 215(1): 118213
[51]
Xiong J, Lin C, Ma R, Cao Z. (2019). Remote sensing estimation of lake total phosphorus concentration based on MODIS: a case study of Lake Hongze. Remote Sensing, 11(17): 2068
CrossRef Google scholar
[52]
Xu W, Li X, Li Y, Sun Y, Zhang L, Huang Y, Yang Z. (2021). Rising temperature more strongly promotes low-abundance Paramecium to remove Microcystis and degrade Microcystins. Environmental Pollution, 291: 118143
CrossRef Google scholar
[53]
Xu W, Su X. (2019). Challenges and impacts of climate change and human activities on groundwater-dependent ecosystems in arid areas: a case study of the Nalenggele alluvial fan in NW China. Journal of Hydrology, 573: 376–385
CrossRef Google scholar
[54]
Yang Y, Gao B, Hao H, Zhou H, Lu J. (2017). Nitrogen and phosphorus in sediments in China: a national-scale assessment and review. Science of the Total Environment, 576: 840–849
CrossRef Google scholar
[55]
Ye R, Shan K, Gao H, Zhang R, Xiong W, Wang Y, Qian X. (2014). Spatio-temporal distribution patterns in environmental factors, chlorophyll-a and microcystins in a large shallow lake, Lake Taihu, China. International Journal of Environmental Research and Public Health, 11(5): 5155–5169
CrossRef Google scholar
[56]
Yu H, Jiang S, Land K C. (2015). Multicollinearity in hierarchical linear models. Social Science Research, 53: 118–136
CrossRef Google scholar
[57]
Yu Q, Wang F, Yan W, Zhang F, Lv S, Li Y. (2018). Carbon and nitrogen burial and response to climate change and anthropogenic disturbance in Chaohu Lake, China. International Journal of Environmental Research and Public Health, 15(12): 2734
CrossRef Google scholar
[58]
Yuan L L, Pollard A I. (2017). Using national-scale data to develop nutrient–microcystin relationships that guide management decisions. Environmental Science & Technology, 51(12): 6972–6980
CrossRef Google scholar
[59]
Zhang F, Xue B, Cai Y, Xu H, Zou W. (2023). Utility of trophic state index in lakes and reservoirs in the Chinese eastern plains ecoregion: the key role of water depth. Ecological Indicators, 148: 110029
CrossRef Google scholar
[60]
Zhang J, Fu P, Meng F, Yang X, Xu J, Cui Y. (2022). Estimation algorithm for chlorophyll-a concentrations in water from hyperspectral images based on feature derivation and ensemble learning. Ecological Informatics, 71: 101783
CrossRef Google scholar
[61]
Zhang M, Leyi N, Cao T, Fang T, Xiong D W, Zhou G J, Zhu G R, Jun X U, Guo L G. (2010). Impact of aquatic environmental factors on distribution pattern of aquatic macrophytes in upper reaches of Taihu Lake watershed. Environmental Science & Technology, 33(3): 171–174
[62]
Zhang N, Zang S. (2015). Characteristics of phytoplankton distribution for assessment of water quality in the Zhalong Wetland, China. International Journal of Environmental Science and Technology, 12(11): 3657–3664
CrossRef Google scholar
[63]
Znachor P, Nedoma J, Hejzlar J, Seďa J, Komárková J, Kolář V, Mrkvička T, Boukal D S. (2020). Changing environmental conditions underpin long-term patterns of phytoplankton in a freshwater reservoir. Science of the Total Environment, 710: 135626
CrossRef Google scholar

Acknowledgements

The authors acknowledge the National Natural Science Foundation of China (Nos. 51278001 and U22A20401), and the Anhui Province Major Science and Technology Projects (China) (No. 202003a0702014) for supporting this work. We thank Letpub for its linguistic assistance during the preparation of this manuscript.

Conflict of Interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Electronic Supplementary Material

Supplementary material is available in the online version of this article at https://doi.org/10.1007/s11783-025-1934-6 and is accessible for authorized users.

RIGHTS & PERMISSIONS

2025 Higher Education Press 2025
PDF(6551 KB)

Accesses

Citations

Detail

Sections
Recommended

/