Collections

Artificial Intelligence/Machine Learning on Environmental Science & Engineering
Editors: Yongsheng Chen, Xiaonan Wang, Joe F. Bozeman III & Shouliang Yi
Publication years
Loading ...
Article types
Loading ...
  • Select all
  • PERSPECTIVES
    Joe F. Bozeman III
    Frontiers of Environmental Science & Engineering, 2024, 18(5): 65. https://doi.org/10.1007/s11783-024-1825-2

    ● Socioecological inequity must be understood to improve environmental data science.

    ● The Systemic Equity Framework and Wells-Du Bois Protocol mitigate inequity.

    ● Addressing irreproducibility in machine learning is vital for bolstering integrity.

    ● Future directions include policy enforcement and systematic programming.

    Socioecological inequity in environmental data science—such as inequities deriving from data-driven approaches and machine learning (ML)—are current issues subject to debate and evolution. There is growing consensus around embedding equity throughout all research and design domains—from inception to administration, while also addressing procedural, distributive, and recognitional factors. Yet, practically doing so may seem onerous or daunting to some. The current perspective helps to alleviate these types of concerns by providing substantiation for the connection between environmental data science and socioecological inequity, using the Systemic Equity Framework, and provides the foundation for a paradigmatic shift toward normalizing the use of equity-centered approaches in environmental data science and ML settings. Bolstering the integrity of environmental data science and ML is just beginning from an equity-centered tool development and rigorous application standpoint. To this end, this perspective also provides relevant future directions and challenges by overviewing some meaningful tools and strategies—such as applying the Wells-Du Bois Protocol, employing fairness metrics, and systematically addressing irreproducibility; emerging needs and proposals—such as addressing data-proxy bias and supporting convergence research; and establishes a ten-step path forward. Afterall, the work that environmental scientists and engineers do ultimately affect the well-being of us all.

  • RESEARCH ARTICLE
    Qiannan Duan, Pengwei Yan, Yichen Feng, Qianru Wan, Xiaoli Zhu
    Frontiers of Environmental Science & Engineering, 2024, 18(5): 55. https://doi.org/10.1007/s11783-024-1815-4

    ● A machine learning path for predicting biochar adsorption efficiency was constructed.

    ● Stacking model has exhibited better prediction accuracy and generalization ability.

    ● The proposed method could be used to optimize the preparation conditions of biochars.

    Heavy metals (HMs) represent pervasive and highly toxic environmental pollutants, known for their long latency periods and high toxicity levels, which pose significant challenges for their removal and degradation. Therefore, the removal of heavy metals from the environment is crucial to ensure the water safety. Biochar materials, known for their intricate pore structures and abundant oxygen-containing functional groups, are frequently harnessed for their effectiveness in mitigating heavy metal contamination. However, conventional tests for optimizing biochar synthesis and assessing their heavy metal adsorption capabilities can be both costly and tedious. To address this challenge, this paper proposes a data-driven machine learning (ML) approach to identify the optimal biochar preparation and adsorption reaction conditions, with the ultimate goal of maximizing their adsorption capacity. By utilizing a data set comprising 476 instances of heavy metal absorption by biochar, seven classical integrated models and one stacking model were trained to rapidly predict the efficiency of heavy metal adsorption by biochar. These predictions were based on diverse physicochemical properties of biochar and the specific adsorption reaction conditions. The results demonstrate that the stacking model, which integrates multiple algorithms, allows for training with fewer samples to achieve higher prediction accuracy and improved generalization ability.

  • RESEARCH ARTICLE
    Wenjie Mai, Zhenguo Chen, Xiaoyong Li, Xiaohui Yi, Yingzhong Zhao, Xinzhong He, Xiang Xu, Mingzhi Huang
    Frontiers of Environmental Science & Engineering, 2024, 18(2): 20. https://doi.org/10.1007/s11783-024-1780-y

    ● A hybrid model is proposed to overcome limitations of single model with time series.

    ● CNN and bidirectional NLSTM are combined to solve complex nonlinear monitoring issue.

    ● Attention mechanism is suitably introduced to hybrid model for better convergence.

    ● TPE is used to find the optimal parameter combination faster rather than manual.

    The existing automated wastewater treatment control systems encounter challenges such as the utilization of specialized testing instruments, equipment repair complications, high operational costs, substantial operational errors, and low detection accuracy. An effective soft measure model offers a viable approach for real-time monitoring and the development of automated control in the wastewater treatment process. Consequently, a novel hybrid deep learning CNN-BNLSTM-Attention (CBNLSMA) model, which incorporates convolutional neural networks (CNN), bidirectional nested long and short-term memory neural networks (BNLSTM), attention mechanisms (AM), and Tree-structure Parzen Estimators (TPE), has been developed for monitoring effluent water quality during the wastewater treatment process. The CBNLSMA model is divided into four stages: the CNN module for feature extraction and data filtering to expedite operations; the BNLSTM module for temporal data’s temporal information extraction; the AM module for model weight reassignment; and the TPE optimization algorithm for the CBNLSMA model’s hyperparameter search optimization. In comparison with other models (TPE-CNN-BNLSTM, TPE-BNLSTM-AM, TPE-CNN-AM, PSO-CBNLSTMA), the CBNLSMA model reduced the RMSE for effluent COD prediction by 25.4%, decreased the MAPE by 32.9%, and enhanced the R2 by 14.9%. For the effluent SS prediction, the CBNLSMA model reduced the RMSE by 26.4%, the MAPE by 21.0%, and improved the R2 by 35.7% compared to other models. The simulation results demonstrate that the proposed CBNLSMA model holds significant potential for real-time effluent quality monitoring, indicating its high potential for automated control in wastewater treatment processes.

  • RESEARCH ARTICLE
    Wiley Helm, Shifa Zhong, Elliot Reid, Thomas Igou, Yongsheng Chen
    Frontiers of Environmental Science & Engineering, 2024, 18(2): 17. https://doi.org/10.1007/s11783-024-1777-6

    ● A machine learning approach was applied to predict free chlorine residuals.

    ● Annual data were obtained from chlorination unit at a 98 MGD water treatment plant.

    ● The last model iteration returned a high prediction value ( R 2 = 0.937).

    ● Non-intuitive parameters were found to be highly significant to predictions.

    Chlorine-based disinfection is ubiquitous in conventional drinking water treatment (DWT) and serves to mitigate threats of acute microbial disease caused by pathogens that may be present in source water. An important index of disinfection efficiency is the free chlorine residual (FCR), a regulated disinfection parameter in the US that indirectly measures disinfectant power for prevention of microbial recontamination during DWT and distribution. This work demonstrates how machine learning (ML) can be implemented to improve FCR forecasting when supplied with water quality data from a real, full-scale chlorine disinfection system in Georgia, USA. More precisely, a gradient-boosting ML method (CatBoost) was developed from a full year of DWT plant-generated chlorine disinfection data, including water quality parameters (e.g., temperature, turbidity, pH) and operational process data (e.g., flowrates), to predict FCR. Four gradient-boosting models were implemented, with the highest performance achieving a coefficient of determination, R2, of 0.937. Values that provide explanations using Shapley’s additive method were used to interpret the model’s results, uncovering that standard DWT operating parameters, although non-intuitive and theoretically non-causal, vastly improved prediction performance. These results provide a base case for data-driven DWT disinfection supervision and suggest process monitoring methods to provide better information to plant operators for implementation of safe chlorine dosing to maintain optimum FCR.

  • REVIEW ARTICLE
    Yanpeng Huang, Chao Wang, Yuanhao Wang, Guangfeng Lyu, Sijie Lin, Weijiang Liu, Haobo Niu, Qing Hu
    Frontiers of Environmental Science & Engineering, 2024, 18(3): 29. https://doi.org/10.1007/s11783-024-1789-2

    ● The application of ML in groundwater quality assessment and prediction is reviewed.

    ● Bibliometric analysis is performed and summarized to promote application.

    ● The details of the application of ML in GQAP are comprehensively summarized.

    ● Challenges and opportunities of using ML models in GQAP are discussed.

    Groundwater quality assessment and prediction (GQAP) is vital for protecting groundwater resources. Traditional GQAP methods can not adequately capture the complex relationships among attributes and have the disadvantage of being computationally demanding. Recently, the application of machine learning (ML) in GAQP (GQAPxML) has been widely studied due to ML’s reliability and efficiency. While many GQAPxML publications exist, a thorough review is missing. This review provides a comprehensive summary of the development of ML applications in the field of GQAP. First, the workflow of ML modeling is briefly introduced, as are data preparation, model development, model evaluation, and model application. Second, 299 publications related to the topic are filtered, mainly through ML modeling. Subsequently, many aspects of GQAPxML, such as publication trends, the spatial distribution of study areas, the size of data sets, and ML algorithms, are discussed from a bibliometric perspective. In addition, we review in detail the well-established applications and recent findings for several subtopics, including groundwater quality assessment, groundwater quality modeling using groundwater quality parameters, groundwater quality spatial mapping, probability estimation of exceeding the groundwater quality threshold, groundwater quality temporal prediction, and the hybrid use of ML and physics-based models. Finally, the development of GQAPxML is explored from three perspectives: data collection and preprocessing, model building and evaluation, and the broadening of model applications. This review provides a reference for environmental scientists to better understand GQAPxML and promotes the development of innovative methods and improvements in modeling quality.

  • RESEARCH ARTICLE
    Qiyue Wu, Yun Geng, Xinyuan Wang, Dongsheng Wang, ChangKyoo Yoo, Hongbin Liu
    Frontiers of Environmental Science & Engineering, 2024, 18(1): 8. https://doi.org/10.1007/s11783-024-1768-7

    ● PLS-VAER is proposed for modeling of PM2.5 concentration.

    ● Data are decomposed by PLS to capture nonlinear feature.

    ● VAER can improve the predictive performance by variational inference.

    ● The proposed model provides a novel method for monitoring indoor air quality.

    Exposure to poor indoor air conditions poses significant risks to human health, increasing morbidity and mortality rates. Soft measurement modeling is suitable for stable and accurate monitoring of air pollutants and improving air quality. Based on partial least squares (PLS), we propose an indoor air quality prediction model that utilizes variational auto-encoder regression (VAER) algorithm. To reduce the negative effects of noise, latent variables in the original data are extracted by PLS in the first step. Then, the extracted variables are used as inputs to VAER, which improve the accuracy and robustness of the model. Through comparative analysis with traditional methods, we demonstrate the superior performance of our PLS-VAER model, which exhibits improved prediction performance and stability. The root mean square error (RMSE) of PLS-VAER is reduced by 14.71%, 26.47%, and 12.50% compared to single VAER, PLS-SVR, and PLS-ANN, respectively. Additionally, the coefficient of determination (R2) of PLS-VAER improves by 13.70%, 30.09%, and 11.25% compared to single VAER, PLS-SVR, and PLS-ANN, respectively. This research offers an innovative and environmentally-friendly approach to monitor and improve indoor air quality.

  • RESEARCH ARTICLE
    Pengxiao Zhou, Zhong Li, Yimei Zhang, Spencer Snowling, Jacob Barclay
    Frontiers of Environmental Science & Engineering, 2023, 17(12): 152. https://doi.org/10.1007/s11783-023-1752-7

    ● Online learning models accurately predict influent flow rate at wastewater plants.

    ● Models adapt to changing input-output relationships and are friendly to large data.

    ● Online learning models outperform conventional batch learning models.

    ● An optimal prediction strategy is identified through uncertainty analysis.

    ● The proposed models provide support for coping with emergencies like COVID-19.

    Accurate influent flow rate prediction is important for operators and managers at wastewater treatment plants (WWTPs), as it is closely related to wastewater characteristics such as biochemical oxygen demand (BOD), total suspend solids (TSS), and pH. Previous studies have been conducted to predict influent flow rate, and it was proved that data-driven models are effective tools. However, most of these studies have focused on batch learning, which is inadequate for wastewater prediction in the era of COVID-19 as the influent pattern changed significantly. Online learning, which has distinct advantages of dealing with stream data, large data set, and changing data pattern, has a potential to address this issue. In this study, the performance of conventional batch learning models Random Forest (RF), K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP), and their respective online learning models Adaptive Random Forest (aRF), Adaptive K-Nearest Neighbors (aKNN), and Adaptive Multi-Layer Perceptron (aMLP), were compared for predicting influent flow rate at two Canadian WWTPs. Online learning models achieved the highest R2, the lowest MAPE, and the lowest RMSE compared to conventional batch learning models in all scenarios. The R2 values on testing data set for 24-h ahead prediction of the aRF, aKNN, and aMLP at Plant A were 0.90, 0.73, and 0.87, respectively; these values at Plant B were 0.75, 0.78, and 0.56, respectively. The proposed online learning models are effective in making reliable predictions under changing data patterns, and they are efficient in dealing with continuous and large influent data streams. They can be used to provide robust decision support for wastewater treatment and management in the changing era of COVID-19 and also under other unprecedented emergencies that could change influent patterns.

  • RESEARCH ARTICLE
    Min Cheng, Zhiyuan Zhang, Shihui Wang, Kexin Bi, Kong-qiu Hu, Zhongde Dai, Yiyang Dai, Chong Liu, Li Zhou, Xu Ji, Wei-qun Shi
    Frontiers of Environmental Science & Engineering, 2023, 17(12): 148. https://doi.org/10.1007/s11783-023-1748-3

    ● Screened 8862 metal-organic frameworks for I2 capture via molecular simulation.

    ● Ranked metal-organic frameworks on predicted I2 uptake and identified Top 10.

    ● Established quantitative structure-property relationships via machine learning.

    We performed large-scale molecular simulation to screen and identify metal-organic framework materials for gaseous iodine capture, as part of our ongoing effort in addressing management and handling issues of various radionuclides in the grand scheme of spent nuclear fuel reprocessing. Starting from the computation-ready experimental (CoRE) metal-organic frameworks (MOFs) database, grand canonical Monte Carlo simulation was employed to predict the iodine uptake values of the MOFs. A ranking list of MOFs based on their iodine uptake capabilities was generated, with the Top 10 candidates identified and their respective adsorption sites visualized. Subsequently, machine learning was used to establish structure-property relationships to correlate MOFs’ various structural and chemical features with their corresponding performances in iodine capture, yielding interpretable common features and design rules for viable MOF adsorbents. The research strategy and framework of the present study could aid the development of high-performing MOF adsorbents for capture and recovery of radioactive iodine, and moreover, other volatile environmentally hazardous species.

  • RESEARCH ARTICLE
    Weishuai Li, Jingang Huang, Zhuoer Shi, Wei Han, Ting Lü, Yuanyuan Lin, Jianfang Meng, Xiaobing Xu, Pingzhi Hou
    Frontiers of Environmental Science & Engineering, 2023, 17(11): 135. https://doi.org/10.1007/s11783-023-1735-8

    ● Data-driven approach was used to simulate VFA production from WAS fermentation.

    ● Three machine learning models were developed and evaluated.

    ● XGBoost showed best prediction performance and excellent generalization ability.

    ● pH and protein were the top two input features for the modeling.

    ● The maximal VFA production was predicted to be 650 mg COD/g VSS.

    Riboflavin is a redox mediator that promotes volatile fatty acids (VFAs) production from waste activated sludge (WAS) and is a promising method for WAS reuse. However, time- and labor-consuming experiments challenge obtaining optimal operating conditions for maximal VFA production. In this study, three machine learning (ML) models were developed to predict the VFAs production from riboflavin-mediated WAS fermentation systems. Among the three tested ML algorithms, eXtreme Gradient Boosting (XGBoost) presented the best prediction performance and excellent generalization ability, with the highest testing coefficient of determination (R2 of 0.93) and lowest root mean square error (RMSE of 0.070). Feature importance analysis and their interactions using the Shepley Additive Explanations (SHAP) method indicated that pH and soluble protein were the top two input features for the modeling. The intrinsic correlations between input features and microbial communities corroborated this deduction. On the optimized ML model, genetic algorithm (GA) and particle swarm optimization (PSO) solved the optimal solution of VFA output, predicting the maximum VFA output as 650 mg COD/g VSS. This study provided a data-driven approach to predict and optimize VFA production from riboflavin-mediated WAS fermentation.

  • RESEARCH ARTICLE
    Haoyang Xian, Pinjing He, Dongying Lan, Yaping Qi, Ruiheng Wang, Fan Lü, Hua Zhang, Jisheng Long
    Frontiers of Environmental Science & Engineering, 2023, 17(10): 121. https://doi.org/10.1007/s11783-023-1721-1

    ● A method based on ATR-FTIR and ML was developed to predict CHNS contents in waste.

    ● Feature selection methods were used to improve models’ prediction accuracy.

    ● The best model predicted C, H, and N contents with accuracy R 2 ≥ 0.93, 0.87, 0.97.

    ● Some suitable models showed insensitivity to spectral noise.

    ● Under moisture interference, the models still had good prediction performance.

    Elemental composition is a key parameter in solid waste treatment and disposal. This study has proposed a method based on infrared spectroscopy and machine learning algorithms that can rapidly predict the elemental composition (C, H, N, S) of solid waste. Both noise and moisture spectral interference that may occur in practical application are investigated. By comparing two feature selection methods and five machine learning algorithms, the most suitable models are selected. Moreover, the impacts of noise and moisture on the models are discussed, with paper, plastic, textiles, wood, and leather as examples of recyclable waste components. The results show that the combination of the feature selection and K-nearest neighbor (KNN) approaches exhibits the best prediction performance and generalization ability. Particularly, the coefficient of determination (R2) of the validation set, cross validation and test set are higher than 0.93, 0.89, and 0.97 for predicting the C, H, and N contents, respectively. Further, KNN is less sensitive to noise. Under moisture interference, the combination of feature selection and support vector regression or partial least-squares regression shows satisfactory results. Therefore, the elemental compositions of solid waste are quickly and accurately predicted under noise and moisture disturbances using infrared spectroscopy and machine learning algorithms.

  • RESEARCH ARTICLE
    Xiaohua Fu, Qingxing Zheng, Guomin Jiang, Kallol Roy, Lei Huang, Chang Liu, Kun Li, Honglei Chen, Xinyu Song, Jianyu Chen, Zhenxing Wang
    Frontiers of Environmental Science & Engineering, 2023, 17(8): 98. https://doi.org/10.1007/s11783-023-1698-9

    ● Data acquisition and pre-processing for wastewater treatment were summarized.

    ● A PSO-SVR model for predicting CODeff in wastewater was proposed.

    ● The CODeff prediction performances of the three models in the paper were compared.

    ● The CODeff prediction effects of different models in other studies were discussed.

    The mining-beneficiation wastewater treatment is highly complex and nonlinear. Various factors like influent quality, flow rate, pH and chemical dose, tend to restrict the effluent effectiveness of mining-beneficiation wastewater treatment. Chemical oxygen demand (COD) is a crucial indicator to measure the quality of mining-beneficiation wastewater. Predicting COD concentration accurately of mining-beneficiation wastewater after treatment is essential for achieving stable and compliant discharge. This reduces environmental risk and significantly improves the discharge quality of wastewater. This paper presents a novel AI algorithm PSO-SVR, to predict water quality. Hyperparameter optimization of our proposed model PSO-SVR, uses particle swarm optimization to improve support vector regression for COD prediction. The generalization capacity tested on out-of-distribution (OOD) data for our PSO-SVR model is strong, with the following performance metrics of root means square error (RMSE) is 1.51, mean absolute error (MAE) is 1.26, and the coefficient of determination (R2) is 0.85. We compare the performance of PSO-SVR model with back propagation neural network (BPNN) and radial basis function neural network (RBFNN) and shows it edges over in terms of the performance metrics of RMSE, MAE and R2, and is the best model for COD prediction of mining-beneficiation wastewater. This is because of the less overfitting tendency of PSO-SVR compared with neural network architectures. Our proposed PSO-SVR model is optimum for the prediction of COD in copper-molybdenum mining-beneficiation wastewater treatment. In addition, PSO-SVR can be used to predict COD on a wide variety of wastewater through the process of transfer learning.

  • RESEARCH ARTICLE
    Xinwan Zhang, Guangyuan Meng, Jinwen Hu, Wanzi Xiao, Tong Li, Lehua Zhang, Peng Chen
    Frontiers of Environmental Science & Engineering, 2023, 17(8): 97. https://doi.org/10.1007/s11783-023-1697-x

    ● Titanium-based flow-through electrode achieved high Cr(VI) reduction efficiency.

    ● Flow-through pattern enhanced the mass transfer and reduced cathodic polarization.

    ● BPNN predicted the optimal electroreduction conditions of flow-through cell.

    Flow-through electrodes have been demonstrated to be effective for electroreduction of Cr(VI), but shortcomings are tedious preparation and short lifetimes. Herein, porous titanium available in the market was studied as a flow-through electrode for Cr(VI) electroreduction. In addition, the intelligent prediction of electrolytic performance based on a back propagation neural network (BPNN) was developed. Voltametric studies revealed that Cr(VI) electroreduction was a diffusion-controlled process. Use of the flow-through mode achieved a high limiting diffusion current as a result of enhanced mass transfer and favorable kinetics. Electroreduction of Cr(VI) in the flow-through system was 1.95 times higher than in a parallel-plate electrode system. When the influent (initial pH 2.0 and 106 mg/L Cr(VI)) was treated at 5.0 V and a flux of 51 L/(h·m2), a reduction efficiency of ~99.9% was obtained without cyclic electrolysis process. Sulfate served as the supporting electrolyte and pH regulator, as reactive CrSO72− species were formed as a result of feeding HSO4. Cr(III) was confirmed as the final product due to the sequential three-electron transport or disproportionation of the intermediate. The developed BPNN model achieved good prediction accuracy with respect to Cr(VI) electroreduction with a high correlation coefficient (R2 = 0.943). Additionally, the electroreduction efficiencies for various operating inputs were predicted based on the BPNN model, which demonstrates the evolutionary role of intelligent systems in future electrochemical technologies.

  • REVIEW ARTICLE
    Yang Zhang, Mei Lei, Kai Li, Tienan Ju
    Frontiers of Environmental Science & Engineering, 2023, 17(8): 93. https://doi.org/10.1007/s11783-023-1693-1

    ● A review of machine learning (ML) for spatial prediction of soil contamination.

    ● ML have achieved significant breakthroughs for soil contamination prediction.

    ● A structured guideline for using ML in soil contamination is proposed.

    ● The guideline includes variable selection, model evaluation, and interpretation.

    Soil pollution levels can be quantified via sampling and experimental analysis; however, sampling is performed at discrete points with long distances owing to limited funding and human resources, and is insufficient to characterize the entire study area. Spatial prediction is required to comprehensively investigate potentially contaminated areas. Consequently, machine learning models that can simulate complex nonlinear relationships between a variety of environmental conditions and soil contamination have recently become popular tools for predicting soil pollution. The characteristics, advantages, and applications of machine learning models used to predict soil pollution are reviewed in this study. Satisfactory model performance generally requires the following: 1) selection of the most appropriate model with the required structure; 2) selection of appropriate independent variables related to pollutant sources and pathways to improve model interpretability; 3) improvement of model reliability through comprehensive model evaluation; and 4) integration of geostatistics with the machine learning model. With the enrichment of environmental data and development of algorithms, machine learning will become a powerful tool for predicting the spatial distribution and identifying sources of soil contamination in the future.

  • RESEARCH ARTICLE
    Zhaocai Wang, Qingyu Wang, Tunhua Wu
    Frontiers of Environmental Science & Engineering, 2023, 17(7): 88. https://doi.org/10.1007/s11783-023-1688-y

    ● A novel VMD-IGOA-LSTM model has proposed for the prediction of water quality.

    ● Improved model quickly converges to the global optimal fitness and remains stable.

    ● The prediction accuracy of water quality parameters is significantly improved.

    Water quality prediction is vital for solving water pollution and protecting the water environment. In terms of the characteristics of nonlinearity, instability, and randomness of water quality parameters, a short-term water quality prediction model was proposed based on variational mode decomposition (VMD) and improved grasshopper optimization algorithm (IGOA), so as to optimize long short-term memory neural network (LSTM). First, VMD was adopted to decompose the water quality data into a series of relatively stable components, with the aim to reduce the instability of the original data and increase the predictability, then each component was input into the IGOA-LSTM model for prediction. Finally, each component was added to obtain the predicted values. In this study, the monitoring data from Dayangzhou Station and Shengmi Station of the Ganjiang River was used for training and prediction. The experimental results showed that the prediction accuracy of the VMD-IGOA-LSTM model proposed was higher than that of the integrated model of Ensemble Empirical Mode Decomposition (EEMD), the integrated model of Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), Nonlinear Autoregressive Network with Exogenous Inputs (NARX), Recurrent Neural Network (RNN), as well as other models, showing better performance in short-term prediction. The current study will provide a reliable solution for water quality prediction studies in other areas.

  • RESEARCH ARTICLE
    Zhongyao Liang, Yaoyang Xu, Gang Zhao, Wentao Lu, Zhenghui Fu, Shuhang Wang, Tyler Wagner
    Frontiers of Environmental Science & Engineering, 2023, 17(6): 76. https://doi.org/10.1007/s11783-023-1676-2

    ● A novel framework integrating quantile regression with machine learning is proposed.

    ● It aims to identify factors driving observations to upper boundary of relationship.

    ● Increasing N:P and TN concentration help fulfill the effect of TP on CHL.

    ● Wetter and warmer decrease potential and increase eutrophication control difficulty.

    ● The framework advances applications of quantile regression and machine learning.

    The identification of factors that may be forcing ecological observations to approach the upper boundary provides insight into potential mechanisms affecting driver-response relationships, and can help inform ecosystem management, but has rarely been explored. In this study, we propose a novel framework integrating quantile regression with interpretable machine learning. In the first stage of the framework, we estimate the upper boundary of a driver-response relationship using quantile regression. Next, we calculate “potentials” of the response variable depending on the driver, which are defined as vertical distances from the estimated upper boundary of the relationship to observations in the driver-response variable scatter plot. Finally, we identify key factors impacting the potential using a machine learning model. We illustrate the necessary steps to implement the framework using the total phosphorus (TP)-Chlorophyll a (CHL) relationship in lakes across the continental US. We found that the nitrogen to phosphorus ratio (N׃P), annual average precipitation, total nitrogen (TN), and summer average air temperature were key factors impacting the potential of CHL depending on TP. We further revealed important implications of our findings for lake eutrophication management. The important role of N׃P and TN on the potential highlights the co-limitation of phosphorus and nitrogen and indicates the need for dual nutrient criteria. Future wetter and/or warmer climate scenarios can decrease the potential which may reduce the efficacy of lake eutrophication management. The novel framework advances the application of quantile regression to identify factors driving observations to approach the upper boundary of driver-response relationships.

  • RESEARCH ARTICLE
    Junlang Li, Zhenguo Chen, Xiaoyong Li, Xiaohui Yi, Yingzhong Zhao, Xinzhong He, Zehua Huang, Mohamed A. Hassaan, Ahmed El Nemr, Mingzhi Huang
    Frontiers of Environmental Science & Engineering, 2023, 17(6): 67. https://doi.org/10.1007/s11783-023-1667-3

    ● Hybrid deep-learning model is proposed for water quality prediction.

    ● Tree-structured Parzen Estimator is employed to optimize the neural network.

    ● Developed model performs well in accuracy and uncertainty.

    ● Usage of the proposed model can reduce carbon emission and energy consumption.

    Anaerobic process is regarded as a green and sustainable process due to low carbon emission and minimal energy consumption in wastewater treatment plants (WWTPs). However, some water quality metrics are not measurable in real time, thus influencing the judgment of the operators and may increase energy consumption and carbon emission. One of the solutions is using a soft-sensor prediction technique. This article introduces a water quality soft-sensor prediction method based on Bidirectional Gated Recurrent Unit (BiGRU) combined with Gaussian Progress Regression (GPR) optimized by Tree-structured Parzen Estimator (TPE). TPE automatically optimizes the hyperparameters of BiGRU, and BiGRU is trained to obtain the point prediction with GPR for the interval prediction. Then, a case study applying this prediction method for an actual anaerobic process (2500 m3/d) is carried out. Results show that TPE effectively optimizes the hyperparameters of BiGRU. For point prediction of CODeff and biogas yield, R2 values of BiGRU, which are 0.973 and 0.939, respectively, are increased by 1.03%–7.61% and 1.28%–10.33%, compared with those of other models, and the valid prediction interval can be obtained. Besides, the proposed model is assessed as a reliable model for anaerobic process through the probability prediction and reliable evaluation. It is expected to provide high accuracy and reliable water quality prediction to offer basis for operators in WWTPs to control the reactor and minimize carbon emission and energy consumption.

  • RESEARCH ARTICLE
    Yirong Hu, Wenjie Du, Cheng Yang, Yang Wang, Tianyin Huang, Xiaoyi Xu, Wenwei Li
    Frontiers of Environmental Science & Engineering, 2023, 17(5): 55. https://doi.org/10.1007/s11783-023-1655-7

    ● A machine learning model was used to identify lake nutrient pollution sources.

    ● XGBoost model showed the best performance for lake water quality prediction.

    ● Model feature size was reduced by screening the key features with the MIC method.

    ● TN and TP concentrations of Lake Taihu are mainly affected by endogenous sources.

    ● Next-month lake TN and TP concentrations were predicted accurately.

    Effective control of lake eutrophication necessitates a full understanding of the complicated nitrogen and phosphorus pollution sources, for which mathematical modeling is commonly adopted. In contrast to the conventional knowledge-based models that usually perform poorly due to insufficient knowledge of pollutant geochemical cycling, we employed an ensemble machine learning (ML) model to identify the key nitrogen and phosphorus sources of lakes. Six ML models were developed based on 13 years of historical data of Lake Taihu’s water quality, environmental input, and meteorological conditions, among which the XGBoost model stood out as the best model for total nitrogen (TN) and total phosphorus (TP) prediction. The results suggest that the lake TN is mainly affected by the endogenous load and inflow river water quality, while the lake TP is predominantly from endogenous sources. The prediction of the lake TN and TP concentration changes in response to these key feature variations suggests that endogenous source control is a highly desirable option for lake eutrophication control. Finally, one-month-ahead prediction of lake TN and TP concentrations (R2 of 0.85 and 0.95, respectively) was achieved based on this model with sliding time window lengths of 9 and 6 months, respectively. Our work demonstrates the great potential of using ensemble ML models for lake pollution source tracking and prediction, which may provide valuable references for early warning and rational control of lake eutrophication.

  • REVIEW ARTICLE
    Rui Liang, Chao Chen, Akash Kumar, Junyu Tao, Yan Kang, Dong Han, Xianjia Jiang, Pei Tang, Beibei Yan, Guanyi Chen
    Frontiers of Environmental Science & Engineering, 2023, 17(4): 44. https://doi.org/10.1007/s11783-023-1644-x

    ● State-of-the-art applications of machine learning (ML) in solid waste (SW) is presented.

    ● Changes of research field over time, space, and hot topics were analyzed.

    ● Detailed application seniors of ML on the life cycle of SW were summarized.

    ● Perspectives towards future development of ML in the field of SW were discussed.

    Due to the superiority of machine learning (ML) data processing, it is widely used in research of solid waste (SW). This study analyzed the research and developmental progress of the applications of ML in the life cycle of SW. Statistical analyses were undertaken on the literature published between 1985 and 2021 in the Science Citation Index Expanded and Social Sciences Citation Index to provide an overview of the progress. Based on the articles considered, a rapid upward trend from 1985 to 2021 was found and international cooperatives were found to have strengthened. The three topics of ML, namely, SW categories, ML algorithms, and specific applications, as applied to the life cycle of SW were discussed. ML has been applied during the entire SW process, thereby affecting its life cycle. ML was used to predict the generation and characteristics of SW, optimize its collection and transportation, and model the processing of its energy utilization. Finally, the current challenges of applying ML to SW and future perspectives were discussed. The goal is to achieve high economic and environmental benefits and carbon reduction during the life cycle of SW. ML plays an important role in the modernization and intellectualization of SW management. It is hoped that this work would be helpful to provide a constructive overview towards the state-of-the-art development of SW disposal.

  • RESEARCH ARTICLE
    Yuanxin Zhang, Fei Li, Chaoqiong Ni, Song Gao, Shuwei Zhang, Jin Xue, Zhukai Ning, Chuanming Wei, Fang Fang, Yongyou Nie, Zheng Jiao
    Frontiers of Environmental Science & Engineering, 2023, 17(2): 21. https://doi.org/10.1007/s11783-023-1621-4

    ● Used a double-stage attention mechanism model to predict ozone.

    ● The model can autonomously select the appropriate time series for forecasting.

    ● The model outperforms other machine learning models and WRF-CMAQ.

    ● We used the model to analyze the driving factors of VOCs that cause ozone pollution.

    Ozone is becoming a significant air pollutant in some regions, and VOCs are essential for ozone prediction as necessary ozone precursors. In this study, we proposed a recurrent neural network based on a double-stage attention mechanism model to predict ozone, selected an appropriate time series for prediction through the input attention and temporal attention mechanisms, and analyzed the cause of ozone generation according to the contribution of feature parameters. The experimental data show that our model had an RMSE of 7.71 μg/m3 and a mean absolute error of 5.97 μg/m3 for 1-h predictions. The DA-RNN model predicted ozone closer to observations than the other models. Based on the importance of the characteristics, we found that the ozone pollution in the Jinshan Industrial Zone mainly comes from the emissions of petrochemical enterprises, and the good generalization performance of the model is proved through testing multiple stations. Our experimental results demonstrate the validity and promising application of the DA-RNN model in predicting atmospheric pollutants and investigating their causes.

  • RESEARCH ARTICLE
    Zhengheng Pu, Jieru Yan, Lei Chen, Zhirong Li, Wenchong Tian, Tao Tao, Kunlun Xin
    Frontiers of Environmental Science & Engineering, 2023, 17(2): 22. https://doi.org/10.1007/s11783-023-1622-3

    ● A novel deep learning framework for short-term water demand forecasting.

    ● Model prediction accuracy outperforms other traditional deep learning models.

    ● Wavelet multi-resolution analysis automatically extracts key water demand features.

    ● An analysis is performed to explain the improved mechanism of the proposed method.

    Short-term water demand forecasting provides guidance on real-time water allocation in the water supply network, which help water utilities reduce energy cost and avoid potential accidents. Although a variety of methods have been proposed to improve forecast accuracy, it is still difficult for statistical models to learn the periodic patterns due to the chaotic nature of the water demand data with high temporal resolution. To overcome this issue from the perspective of improving data predictability, we proposed a hybrid Wavelet-CNN-LSTM model, that combines time-frequency decomposition characteristics of Wavelet Multi-Resolution Analysis (MRA) and implement it into an advanced deep learning model, CNN-LSTM. Four models - ANN, Conv1D, LSTM, GRUN - are used to compare with Wavelet-CNN-LSTM, and the results show that Wavelet-CNN-LSTM outperforms the other models both in single-step and multi-steps prediction. Besides, further mechanistic analysis revealed that MRA produce significant effect on improving model accuracy.