Online machine learning for stream wastewater influent flow rate prediction under unprecedented emergencies
Pengxiao Zhou, Zhong Li, Yimei Zhang, Spencer Snowling, Jacob Barclay
Online machine learning for stream wastewater influent flow rate prediction under unprecedented emergencies
● Online learning models accurately predict influent flow rate at wastewater plants.
● Models adapt to changing input-output relationships and are friendly to large data.
● Online learning models outperform conventional batch learning models.
● An optimal prediction strategy is identified through uncertainty analysis.
● The proposed models provide support for coping with emergencies like COVID-19.
Accurate influent flow rate prediction is important for operators and managers at wastewater treatment plants (WWTPs), as it is closely related to wastewater characteristics such as biochemical oxygen demand (BOD), total suspend solids (TSS), and pH. Previous studies have been conducted to predict influent flow rate, and it was proved that data-driven models are effective tools. However, most of these studies have focused on batch learning, which is inadequate for wastewater prediction in the era of COVID-19 as the influent pattern changed significantly. Online learning, which has distinct advantages of dealing with stream data, large data set, and changing data pattern, has a potential to address this issue. In this study, the performance of conventional batch learning models Random Forest (RF), K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP), and their respective online learning models Adaptive Random Forest (aRF), Adaptive K-Nearest Neighbors (aKNN), and Adaptive Multi-Layer Perceptron (aMLP), were compared for predicting influent flow rate at two Canadian WWTPs. Online learning models achieved the highest R2, the lowest MAPE, and the lowest RMSE compared to conventional batch learning models in all scenarios. The R2 values on testing data set for 24-h ahead prediction of the aRF, aKNN, and aMLP at Plant A were 0.90, 0.73, and 0.87, respectively; these values at Plant B were 0.75, 0.78, and 0.56, respectively. The proposed online learning models are effective in making reliable predictions under changing data patterns, and they are efficient in dealing with continuous and large influent data streams. They can be used to provide robust decision support for wastewater treatment and management in the changing era of COVID-19 and also under other unprecedented emergencies that could change influent patterns.
Wastewater prediction / Data stream / Online learning / Batch learning / Influent flow rates
[1] |
Abu-Bakar H, Williams L, Hallett S H (2021). Quantifying the impact of the COVID-19 lockdown on household water consumption patterns in England. npj Clean Water, 4: 1–9
CrossRef
Google scholar
|
[2] |
Agirre-Basurko E, Ibarra-Berastegi G, Madariaga I (2006). Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environmental Modelling & Software, 21(4): 430–446
CrossRef
Google scholar
|
[3] |
Ahmed N K, Atiya A F, Gayar N E, El-Shishiny H (2010). An empirical comparison of machine learning models for time series forecasting. Econometric Reviews, 29(5–6): 594–621
CrossRef
Google scholar
|
[4] |
Alfano V, Ercolano S (2020). The efficacy of lockdown against COVID-19: a cross-country panel analysis. Applied Health Economics and Health Policy, 18: 509–517
CrossRef
Google scholar
|
[5] |
Andreides M, Dolejš P, Bartáček J (2022). The prediction of WWTP influent characteristics: good practices and challenges. Journal of Water Process Engineering, 49: 103009
CrossRef
Google scholar
|
[6] |
Ansari M, Othman F, Abunama T, El-Shafie A (2018). Analysing the accuracy of machine learning techniques to develop an integrated influent time series model: case study of a sewage treatment plant, Malaysia. Environmental Science and Pollution Research International, 25(12): 12139–12149
CrossRef
Google scholar
|
[7] |
Bechmann H, Nielsen M K, Madsen H, Kjølstad Poulsen N (1999). Grey-box modelling of pollutant loads from a sewer system. Urban Water, 1(1): 71–78
CrossRef
Google scholar
|
[8] |
Bifet A, Gavalda R (2007). Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM, pp. 443–448
|
[9] |
Boyd G, Na D, Li Z, Snowling S, Zhang Q, Zhou P (2019). Influent forecasting for wastewater treatment plants in North America. Sustainability, 11(6): 1764
CrossRef
Google scholar
|
[10] |
Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32
CrossRef
Google scholar
|
[11] |
Bzdok D , Krzywinski M , Altman N . (2018). Machine learning: supervised methods. Nature Methods, 15(1): 5–6
CrossRef
Google scholar
|
[12] |
Caruana R , Niculescu-Mizil A . (2006). An empirical comparison of supervised learning algorithms. ACM International Conference Proceeding Series, 148: 161–168
CrossRef
Google scholar
|
[13] |
Domingos P, Hulten G (2000). Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80
|
[14] |
Fontenla-Romero Ó, Guijarro-Berdiñas B, Martinez-Rego D, Pérez-Sánchez B, Peteiro-Barral D (2013). Online machine learning. In: Efficiency and Scalability Methods for Computational Intellect, IGI Global, pp. 27–54
|
[15] |
Gautam S , Hens L . (2020). COVID-19: impact by and on the environment, health and economy. Environment, Development and Sustainability, 22(6): 4953–4954
CrossRef
Google scholar
|
[16] |
Gomes H M, Barddal J P, Ferreira L E B, Bifet A (2018). Adaptive random forests for data stream regression. In: ESANN
|
[17] |
Gomes H M, Bifet A, Read J, Barddal J P, Enembreck F, Pfharinger B, Holmes G, Abdessalem T (2017). Adaptive random forests for evolving data stream classification. Machine Learning, 106(9–10): 1469–1495
CrossRef
Google scholar
|
[18] |
Hillary L S, Farkas K, Maher K H, Lucaci A, Thorpe J, Distaso M A, Gaze W H, Paterson S, Burke T, Connor T R, McDonald J E, Malham S K, Jones D L (2021). Monitoring SARS-CoV-2 in municipal wastewater to evaluate the success of lockdown measures for controlling COVID-19 in the UK. Water Research, 200, 117214
CrossRef
Google scholar
|
[19] |
Hoi S C H , Sahoo D , Lu J , Zhao P . (2021). Online learning: a comprehensive survey. Neurocomputing, 459: 249–289
CrossRef
Google scholar
|
[20] |
Hoi S C H , Wang J , Zhao P . (2014). Libol: a library for online learning algorithms. Journal of Machine Learning Research, 15: 495–499
|
[21] |
Jain L C, Seera M, Lim C P, Balasubramaniam P (2014). A review of online learning in supervised neural networks. Neural Computing & Applications, 25(3–4): 491–509
CrossRef
Google scholar
|
[22] |
Khan I , Shah D , Shah S S . (2021). COVID-19 pandemic and its positive impacts on environment: an updated review. International Journal of Environmental Science and Technology, 18(2): 521–530
CrossRef
Google scholar
|
[23] |
Kim M , Kim Y , Kim H , Piao W , Kim C . (2016). Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant. Frontiers of Environmental Science & Engineering, 10(2): 299–310
CrossRef
Google scholar
|
[24] |
Kovacs D J, Li Z, Baetz B W, Hong Y, Donnaz S, Zhao X, Zhou P, Ding H, Dong Q (2022). Membrane fouling prediction and uncertainty analysis using machine learning: a wastewater treatment plant case study. Journal of Membrane Science, 660: 120817
CrossRef
Google scholar
|
[25] |
Ma S, Zeng S, Dong X, Chen J, Olsson G (2014). Short-term prediction of influent flow rate and ammonia concentration in municipal wastewater treatment plants. Frontiers of Environmental Science & Engineering, 8, 128–136
CrossRef
Google scholar
|
[26] |
Montiel J , Read J , Bifet A , Abdessalem T . (2018). Scikit-multiflow: a multi-output streaming framework. Journal of Machine Learning Research, 19: 2914–2915
|
[27] |
Nemati M , Tran D . (2022). The impact of COVID-19 on urban water consumption in the United States. Water, 14: 3096
CrossRef
Google scholar
|
[28] |
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12: 2825–2830
|
[29] |
Pu Z , Yan J , Chen L , Li Z , Tian W , Tao T , Xin K . (2023). A hybrid Wavelet-CNN-LSTM deep learning model for short-term urban water demand forecasting. Frontiers of Environmental Science & Engineering, 17(2): 22
CrossRef
Google scholar
|
[30] |
Safaei S H , Young S , Samimi Z , Parvizi F , Shokrollahi A , Baniamer M . (2022). Technology development for the removal of Covid-19 pharmaceutical active compounds from water and wastewater: a review. Journal of Environmental Informatics, 40(2): 141–156
|
[31] |
Taunk K, De S, Verma S, Swetapadma A (2019). A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems, ICCS 2019, 1255–1260
CrossRef
Google scholar
|
[32] |
Wang Z , Wang Q , Wu T . (2023). A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM. Frontiers of Environmental Science & Engineering, 17(7): 88
CrossRef
Google scholar
|
[33] |
Wei X , Kusiak A . (2015). Short-term prediction of influent flow in wastewater treatment plant. Stochastic Environmental Research and Risk Assessment, 29(1): 241–249
CrossRef
Google scholar
|
[34] |
Wei X , Kusiak A , Sadat H R . (2013). Prediction of influent flow rate: data-mining approach. Journal of Energy Engineering, 139(2): 118–123
CrossRef
Google scholar
|
[35] |
Zhang Q , Li Z , Snowling S , Siam A , El-Dakhakhni W . (2019). Predictive models for wastewater flow forecasting based on time series analysis and artificial neural network. Water Science and Technology, 80(2): 243–253
CrossRef
Google scholar
|
[36] |
Zhou P , Li Z , Snowling S , Baetz B W , Na D , Boyd G . (2019a). A random forest model for inflow prediction at wastewater treatment plants. Stochastic Environmental Research and Risk Assessment, 33(10): 1781–1792
CrossRef
Google scholar
|
[37] |
Zhou P , Li Z , Snowling S , Goel R , Zhang Q . (2019b). Short-term wastewater influent prediction based on random forests and multi-layer perceptron. Journal of Environmental Informatics Letters, 1: 87–93
|
[38] |
Zhou P , Li Z , Snowling S , Goel R , Zhang Q . (2022). Multi-step ahead prediction of hourly influent characteristics for wastewater treatment plants: a case study from North America. Environmental Monitoring and Assessment, 194(5): 1–14
CrossRef
Google scholar
|
[39] |
Zhu J , Anderson P R . (2019). Performance evaluation of the ISMLR package for predicting the next day’s influent wastewater flowrate at Kirie WRP. Water Science and Technology, 80(4): 695–706
CrossRef
Google scholar
|
/
〈 | 〉 |