Approaching the upper boundary of driver-response relationships: identifying factors using a novel framework integrating quantile regression with interpretable machine learning

Zhongyao Liang , Yaoyang Xu , Gang Zhao , Wentao Lu , Zhenghui Fu , Shuhang Wang , Tyler Wagner

Front. Environ. Sci. Eng. ›› 2023, Vol. 17 ›› Issue (6) : 76

PDF (1646KB)
Front. Environ. Sci. Eng. ›› 2023, Vol. 17 ›› Issue (6) : 76 DOI: 10.1007/s11783-023-1676-2
RESEARCH ARTICLE
RESEARCH ARTICLE

Approaching the upper boundary of driver-response relationships: identifying factors using a novel framework integrating quantile regression with interpretable machine learning

Author information +
History +
PDF (1646KB)

Abstract

● A novel framework integrating quantile regression with machine learning is proposed.

● It aims to identify factors driving observations to upper boundary of relationship.

● Increasing N:P and TN concentration help fulfill the effect of TP on CHL.

● Wetter and warmer decrease potential and increase eutrophication control difficulty.

● The framework advances applications of quantile regression and machine learning.

The identification of factors that may be forcing ecological observations to approach the upper boundary provides insight into potential mechanisms affecting driver-response relationships, and can help inform ecosystem management, but has rarely been explored. In this study, we propose a novel framework integrating quantile regression with interpretable machine learning. In the first stage of the framework, we estimate the upper boundary of a driver-response relationship using quantile regression. Next, we calculate “potentials” of the response variable depending on the driver, which are defined as vertical distances from the estimated upper boundary of the relationship to observations in the driver-response variable scatter plot. Finally, we identify key factors impacting the potential using a machine learning model. We illustrate the necessary steps to implement the framework using the total phosphorus (TP)-Chlorophyll a (CHL) relationship in lakes across the continental US. We found that the nitrogen to phosphorus ratio (N׃P), annual average precipitation, total nitrogen (TN), and summer average air temperature were key factors impacting the potential of CHL depending on TP. We further revealed important implications of our findings for lake eutrophication management. The important role of N׃P and TN on the potential highlights the co-limitation of phosphorus and nitrogen and indicates the need for dual nutrient criteria. Future wetter and/or warmer climate scenarios can decrease the potential which may reduce the efficacy of lake eutrophication management. The novel framework advances the application of quantile regression to identify factors driving observations to approach the upper boundary of driver-response relationships.

Graphical abstract

Keywords

Driver-response / Upper boundary of relationship / Interpretable machine learning / Quantile regression / Total phosphorus / Chlorophyll a

Cite this article

Download citation ▾
Zhongyao Liang, Yaoyang Xu, Gang Zhao, Wentao Lu, Zhenghui Fu, Shuhang Wang, Tyler Wagner. Approaching the upper boundary of driver-response relationships: identifying factors using a novel framework integrating quantile regression with interpretable machine learning. Front. Environ. Sci. Eng., 2023, 17(6): 76 DOI:10.1007/s11783-023-1676-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Abatzoglou J T, Dobrowski S Z, Parks S A, Hegewisch K C. (2018). TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015. Scientific Data, 5(1): 170191

[2]

Alhamzawi R, Ali H T M. (2020). Brq: an R package for Bayesian quantile regression. Metron, 78(3): 313–328

[3]

Altmann A, Toloşi L, Sander O, Lengauer T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics (Oxford, England), 26(10): 1340–1347

[4]

Araya S N, Ghezzehei T A. (2019). Using machine learning for prediction of saturated hydraulic conductivity and its sensitivity to soil structural perturbations. Water Resources Research, 55(7): 5715–5737

[5]

BeckerMBinder MBischlBLangMPfisterer FReichN GRichterJSchratzP SonabendR (2021). mlr3 book

[6]

Benoit D F, den Poel D V. (2010). Binary quantile regression: a Bayesian approach based on the asymmetric Laplace distribution. Journal of Applied Econometrics, 27(7): 1174–1188

[7]

BiecekPBurzykowski T (2021). Explanatory Model Analysis. New York: Chapman and Hall/CRC

[8]

Breiman L. (2001). Random forests. Machine Learning, 45(1): 5–32

[9]

Cade B S, Noon B R. (2003). A gentle introduction to quantile regression for ecologists. Frontiers in Ecology and the Environment, 1(8): 412–420

[10]

Cade B S, Terrell J W, Schroeder R L. (1999). Estimating effects of limiting factors with regression quantiles. Ecology, 80(1): 311–323

[11]

Carvalho L, McDonald C, de Hoyos C, Mischke U, Phillips G, Borics G, Poikane S, Skjelbred B, Solheim A L, Wichelen J V. . (2013). Sustaining recreational quality of European lakes: minimizing the health risks from algal blooms through phosphorus control. Journal of Applied Ecology, 50(2): 315–323

[12]

Castrillo M, García Á L. (2020). Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods. Water Research, 172: 115490

[13]

Cha Y, Shin J, Go B, Lee D S, Kim Y, Kim T, Park Y S. (2021). An interpretable machine learning method for supporting ecosystem management: application to species distribution models of freshwater macroinvertebrates. Journal of Environmental Management, 291: 112719

[14]

Chandrashekar G, Sahin F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1): 16–28

[15]

Chen K, Chen H, Zhou C, Huang Y, Qi X, Shen R, Liu F, Zuo M, Zou X, Wang J, Zhang Y, Chen D, Chen X, Deng Y, Ren H. (2020). Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Research, 171: 115454

[16]

Chen M, Fan M, Liu R, Wang X, Yuan X, Zhu H. (2015). The dynamics of temperature and light on the growth of phytoplankton. Journal of Theoretical Biology, 385: 8–19

[17]

Chen X, Li X. (2014). Using quantile regression to analyze the stressor-response relationships between nutrient levels and algal biomass in three shallow lakes of the lake Taihu Basin, China. Chinese Science Bulletin, 59(28): 3621–3629

[18]

CheruvelilK SWebsterK EKingK B S PoissonA CWagner T (2022). Taking a macroscale perspective to improve understanding of shallow lake total phosphorus and chlorophyll a. Hydrobiologia, 849(17–18): 3663–3677

[19]

Collins S M, Yuan S, Tan P N, Oliver S K, Lapierre J F, Cheruvelil K S, Fergus C E, Skaff N K, Stachelek J, Wagner T. . (2019). Winter precipitation and summer temperature predict lake water quality at macroscales. Water Resources Research, 55(4): 2708–2721

[20]

Conley D J, Paerl H W, Howarth R W, Boesch D F, Seitzinger S P, Havens K E, Lancelot C, Likens G E. (2009). Controlling eutrophication: nitrogen and phosphorus. Science, 323(5917): 1014–1015

[21]

Das K, Krzywinski M, Altman N. (2019). Quantile regression. Nature Methods, 16(6): 451–452

[22]

de Vries J, Kraak M H, Skeffington R A, Wade A J, Verdonschot P F. (2021). A Bayesian network to simulate macroinvertebrate responses to multiple stressors in lowland streams. Water Research, 194: 116952

[23]

Dewitz JU.S.. Geological Survey (2021). National land cover database (NLCD) 2019 products (Ver. 2.0, June 2021). Washington, DC: U.S. Geological Survey Data Release

[24]

Dillon P J, Rigler F H. (1974). The phosphorus-chlorophyll relationship in lakes. Limnology and Oceanography, 19(5): 767–773

[25]

Dugan H A, Skaff N K, Doubek J P, Bartlett S L, Burke S M, Krivak-Tetley F E, Summers J C, Hanson P C, Weathers K C. (2020). Lakes at risk of chloride contamination. Environmental Science & Technology, 54(11): 6639–6650

[26]

Elser J J, Bracken M E, Cleland E E, Gruner D S, Harpole W S, Hillebrand H, Ngai J T, Seabloom E W, Shurin J B, Smith J E. (2007). Global analysis of nitrogen and phosphorus limitation of primary producers in freshwater, marine and terrestrial ecosystems. Ecology Letters, 10(12): 1135–1142

[27]

Filstrup C T, Downing J A. (2017). Relationship of chlorophyll to phosphorus and nitrogen in nutrient-rich lakes. Inland Waters, 7(4): 385–400

[28]

Fornaroli R, Cabrini R, Sartori L, Marazzi F, Vracevic D, Mezzanotte V, Annala M, Canobbio S. (2015). Predicting the constraint effect of environmental characteristics on macroinvertebrate density and diversity using quantile regression mixed model. Hydrobiologia, 742(1): 153–167

[29]

Fornaroli R, Ippolito A, Tolkkinen M J, Mykra H, Muotka T, Balistrieri L S, Schmidt T S. (2018). Disentangling the effects of low pH and metal mixture toxicity on macroinvertebrate diversity. Environmental Pollution, 235: 889–898

[30]

Freeman A M, Lamon E C III, Stow C A. (2009). Nutrient criteria for lakes, ponds, and reservoirs: a Bayesian TREED model approach. Ecological Modelling, 220(5): 630–639

[31]

GuildfordS JHeckyR E (2000). Total nitrogen, total phosphorus, and nutrient limitation in lakes and oceans: is there a common relationship? Limnology and Oceanography, 45(6): 1213–1223

[32]

Hammer K J, Kragh T, Sand-Jensen K. (2019). Inorganic carbon promotes photosynthesis, growth, and maximum biomass of phytoplankton in eutrophic water bodies. Freshwater Biology, 64(11): 1956–1970

[33]

Havens K E, James R, East T L, Smith V H. (2003). N׃P ratios, light limitation, and cyanobacterial dominance in a subtropical lake impacted by non-point source nutrient pollution. Environmental Pollution, 122(3): 379–390

[34]

Havens K E, Nürnberg G K. (2004). The phosphorus-chlorophyll relationship in lakes: potential influences of color and mixing regime. Lake and Reservoir Management, 20(3): 188–196

[35]

Hunsicker M E, Kappel C V, Selkoe K A, Halpern B S, Scarborough C, Mease L, Amrhein A. (2015). Characterizing driver-response relationships in marine pelagic ecosystems for improved ocean management. Ecological Applications, 26(3): 651–663

[36]

Huo S, Xi B, Ma C, Liu H. (2013). Stressor-response models: a practical application for the development of lake nutrient criteria in China. Environmental Science & Technology, 47(21): 11922–11923

[37]

Jones J R, Knowlton M F, Kaiser M S. (1998). Effects of aggregation on chlorophyll-phosphorus relations in Missouri Reservoirs. Lake and Reservoir Management, 14(1): 1–9

[38]

Jones J R, Obrecht D V, Thorpe A P. (2011). Chlorophyll maxima and chlorophyll: total phosphorus ratios in Missouri reservoirs. Lake and Reservoir Management, 27(4): 321–328

[39]

Kalcic M M, Muenich R L, Basile S, Steiner A L, Kirchhoff C, Scavia D. (2019). Climate change and nutrient loading in the western Lake Erie basin: warming can counteract a wetter future. Environmental Science & Technology, 53(13): 7543–7550

[40]

KoenkerR (2021). Quantreg: Quantile Regression. R Package Version 5.85

[41]

Koenker R, Bassett G. (1978). Regression quantiles. Econometrica, 46(1): 33–50

[42]

Koenker R, Ng P, Portnoy S. (1994). Quantile smoothing splines. Biometrika, 81(4): 673–680

[43]

KoenkerRPark B J (1996). An interior point algorithm for nonlinear quantile regression. Journal of Econometrics, 71(1–2): 265–283

[44]

Konrad C P, Brasher A M D, May J T. (2008). Assessing streamflow characteristics as limiting factors on benthic invertebrate assemblages in streams across the western United States. Freshwater Biology, 53(10): 1983–1998

[45]

Kragh T, Sand-Jensen K. (2018). Carbon limitation of lake productivity. Proceedings of the Royal Society B. Biological Sciences, 285(1891): 20181415

[46]

LangMSchratz P (2021). mlr3verse: Easily Install and Load the ‘mlr3’ Package Family. R Package Version 0.2.1

[47]

Larned S T, Schallenberg M. (2019). Stressor-response relationships and the prospective management of aquatic ecosystems. New Zealand Journal of Marine and Freshwater Research, 53(4): 489–512

[48]

Lee D, Neocleous T. (2010). Bayesian quantile regression for count data with application to environmental epidemiology. Applied Statistics, 59(5): 905–920

[49]

Li J, Cheng K, Wang S, Morstatter F, Trevino R P, Tang J, Liu H. (2018). Feature selection. ACM Computing Surveys, 50(6): 1–45

[50]

Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. (2017). Hyperband: a novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(1): 1–52

[51]

Liang Z, Liu Y, Xu Y, Wagner T. (2021a). Bayesian change point quantile regression approach to enhance the understanding of shifting phytoplankton-dimethyl sulfide relationships in aquatic ecosystems. Water Research, 201: 117287

[52]

Liang Z, Soranno P A, Wagner T. (2020). The role of phosphorus and nitrogen on chlorophyll a: evidence from hundreds of lakes. Water Research, 185: 116236

[53]

Liang Z, Wu S, Chen H, Yu Y, Liu Y. (2018). A probabilistic method to enhance understanding of nutrient limitation dynamics of phytoplankton. Ecological Modelling, 368: 404–410

[54]

Liang Z, Xu Y, Qiu Q, Liu Y, Lu W, Wagner T. (2021b). A framework to develop joint nutrient criteria for lake eutrophication management in eutrophic lakes. Journal of Hydrology (Amsterdam), 594: 125883

[55]

Loiselle S A, C’ozar A, Dattilo A, Bracchini L, G’alvez J A. (2007). Light limitations to algal growth in tropical ecosystems. Freshwater Biology, 52(2): 305–312

[56]

Lucas T C D. (2020). A translucent box: interpretable machine learning in ecology. Ecological Monographs, 90(4): e01422

[57]

McDowell R W, Schallenberg M, Larned S. (2018). A strategy for optimizing catchment management actions to stressor-response relationships in freshwaters. Ecosphere, 9(10): e02482

[58]

Hein M. (1997). Inorganic carbon limitation of photosynthesis in lake phytoplankton. Freshwater Biology, 37(3): 545–552

[59]

Moon D L, Scott J T, Johnson T R. (2021). Stoichiometric imbalances complicate prediction of phytoplankton biomass in U.S. lakes: implications for nutrient criteria. Limnology and Oceanography, 66(8): 2967–2978

[60]

Motew M, Booth E G, Carpenter S R, Chen X, Kucharik C J. (2018). The synergistic effect of manure supply and extreme precipitation on surface water quality. Environmental Research Letters, 13(4): 044016

[61]

Murdoch W J, Singh C, Kumbier K, Abbasi-Asl R, Yu B. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences of the United States of America, 116(44): 22071–22080

[62]

Niu W, Feng Z, Li S, Wu H, Wang J. (2021). Short-term electricity load time series prediction by machine learning model via feature selection and parameter optimization using hybrid cooperation search algorithm. Environmental Research Letters, 16(5): 055032

[63]

Nyenje P, Foppen J, Uhlenbrook S, Kulabako R, Muwanga A. (2010). Eutrophication and nutrient release in urban areas of sub-Saharan Africa: a review. Science of the Total Environment, 408(3): 447–455

[64]

Obenour D R, Gronewold A D, Stow C A, Scavia D. (2014). Using a Bayesian hierarchical model to improve Lake Erie cyanobacteria bloom forecasts. Water Resources Research, 50(10): 7847–7860

[65]

Paerl H W, Havens K E, Xu H, Zhu G, McCarthy M J, Newell S E, Scott J T, Hall N S, Otten T G, Qin B. (2020). Mitigating eutrophication and toxic cyanobacterial blooms in large lakes: the evolution of a dual nutrient (N and P) reduction paradigm. Hydrobiologia, 847(21): 4359–4375

[66]

Paerl H W, Paul V J. (2012). Climate change: Links to global expansion of harmful cyanobacteria. Water Research, 46(5): 1349–1363

[67]

Paerl H W, Scott J T, McCarthy M J, Newell S E, Gardner W S, Havens K E, Hoffman D K, Wilhelm S W, Wurtsbaugh W A. (2016). It takes two to tango: when and where dual nutrient (N & P) reductions are needed to protect lakes and downstream ecosystems. Environmental Science & Technology, 50(20): 10805–10813

[68]

Probst P, Wright M N, Boulesteix A L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery, 9(3): e1301

[69]

QianS S (2009). Environmental and Ecological Statistics with R. New York: Chapman and Hall/CRC

[70]

Quinlan R, Filazzola A, Mahdiyan O, Shuvo A, Blagrave K, Ewins C, Moslenko L, Gray D K, O’Reilly C M, Sharma S. (2021). Relationships of total phosphorus and chlorophyll in lakes worldwide. Limnology and Oceanography, 66(2): 392–404

[71]

RCore Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria

[72]

Redfield A C. (1958). The biological control of chemical factors in the environment. American Scientist, 46(3): 205–221

[73]

Rousso B Z, Bertone E, Stewart R, Hamilton D P. (2020). A systematic literature review of forecasting and predictive models for cyanobacteria blooms in freshwater lakes. Water Research, 182: 115959

[74]

Rowland F E, Stow C A, Johengen T H, Burtner A M, Palladino D, Gossiaux D C, Davis T W, Johnson L T, Ruberg S. (2020). Recent patterns in Lake Erie phosphorus and chlorophyll a concentrations in response to changing loads. Environmental Science & Technology, 54(2): 835–841

[75]

Rudin C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5): 206–215

[76]

Ryo M, Angelov B, Mammola S, Kass J M, Benito B M, Hartig F. (2021). Explainable artificial intelligence enhances the ecological interpretability of black-box species distribution models. Ecography, 44(2): 199–205

[77]

Sankaran M, Hanan N P, Scholes R J, Ratnam J, Augustine D J, Cade B S, Gignoux J, Higgins S I, Roux X L, Ludwig F. . (2005). Determinants of woody cover in African savannas. Nature, 438(7069): 846–849

[78]

Schallenberg M. (2021). The application of stressor-response relationships in the management of lake eutrophication. Inland Waters, 11(1): 1–12

[79]

Sinha E, Michalak A M, Balaji V. (2017). Eutrophication will increase during the 21st century as a result of precipitation changes. Science, 357(6349): 405–408

[80]

SøndergaardMJensenJ PJeppesenE (2003). Role of sediment and internal loading of phosphorus in shallow lakes. Hydrobiologia, 506–509(1–3): 506–509

[81]

Soranno P A, Cheruvelil K S, Bissell E G, Bremigan M T, Downing J A, Fergus C E, Filstrup C T, Henry E N, Lottig N R, Stanley E H. . (2014). Cross-scale interactions: quantifying multi-scaled cause-effect relationships in macrosystems. Frontiers in Ecology and the Environment, 12(1): 65–73

[82]

Stockwell J D, Doubek J P, Adrian R, Anneville O, Carey C C, Carvalho L, Domis L N D S, Dur G, Frassl M A, Grossart H P. . (2020). Storm impacts on phytoplankton community dynamics in lakes. Global Change Biology, 26(5): 2756–2784

[83]

Stow C A, Cha Y (2013). Are chlorophyll a–total phosphorus correlations useful for inference and prediction? Environmental Science & Technology, 47(8): 3768–3773

[84]

Sun A Y, Scanlon B R. (2019). How can big data and machine learning benefit environment and water management: a survey of methods, applications, and future directions. Environmental Research Letters, 14(7): 073001

[85]

Tiyasha T M, Tung Z M. (2020). A survey on river water quality modelling using artificial intelligence models: 2000–2020. Journal of Hydrology, 585: 124670

[86]

Tong Y, Xu X, Qi M, Sun J, Zhang Y, Zhang W, Wang M, Wang X, Zhang Y. (2021). Lake warming intensifies the seasonal pattern of internal nutrient cycling in the eutrophic lake and potential impacts on algal blooms. Water Research, 188: 116570

[87]

Wagner T, Soranno P A, Webster K E, Cheruvelil K S. (2011). Landscape drivers of regional variation in the relationship between total phosphorus and chlorophyll in lakes. Freshwater Biology, 56(9): 1811–1824

[88]

Wang R, Kim J H, Li M H. (2021). Predicting stream water quality under different urban development pattern scenarios with an interpretable machine learning approach. Science of the Total Environment, 761: 144057

[89]

Woolway R I, Kraemer B M, Lenters J D, Merchant C J, O’Reilly C M, Sharma S. (2020). Global lake responses to climate change. Nature Reviews. Earth & Environment, 1(8): 388–403

[90]

Wright M N, Ziegler A. (2017). Ranger: a fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1): 1–17

[91]

Xu Y, Schroth A W, Isles P D F, Rizzo D M. (2015). Quantile regression improves models of lake eutrophication with implications for ecosystem-specific management. Freshwater Biology, 60(9): 1841–1853

[92]

YadavSShukla S (2016). Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: 2016 IEEE 6th International Conference on Advanced Computing (IACC). IEEE

[93]

Yang L, Shami A. (2020). On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing, 415: 295–316

[94]

Yuan L L, Jones J R. (2020). Rethinking phosphorus–chlorophy Ⅱ relationships in lakes. Limnology and Oceanography, 65(8): 1847–1857

[95]

Yusta S C. (2009). Different metaheuristic strategies to solve the feature selection problem. Pattern Recognition Letters, 30(5): 525–534

[96]

Zagarese H E, de los Ángeles Gonźalez Sagrario M, Wolf-Gladrow D, Nõges P, Nõges T, Kangur K, Matsuzaki S I S, Kohzu A, Vanni M J, Õzkundakci D. . (2021). Patterns of CO2 concentration and inorganic carbon limitation of phytoplankton biomass in agriculturally eutrophic lakes. Water Research, 190: 116715

[97]

Zhang Y, Qin B, Zhu G, Shi K, Zhou Y. (2018). Profound changes in the physical environment of Lake Taihu from 25 years of long-term observations: implications for algal bloom outbreaks and aquatic macrophyte loss. Water Resources Research, 54(7): 4319–4331

[98]

Zou W, Zhu G, Cai Y, Xu H, Zhu M, Gong Z, Zhang Y, Qin B. (2020). Quantifying the dependence of cyanobacterial growth to nutrient for the eutrophication management of temperate-subtropical shallow lakes. Water Research, 177: 115806

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1646KB)

Supplementary files

FSE-22126-OF-LZY_suppl_1

3170

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/