Forecasting SARS-CoV-2 outbreak through wastewater analysis: a success in wastewater-based epidemiology
Rubén Cañas Cañas, Raimundo Seguí López-Peñalver, Jorge Casaña Mohedo, José Vicente Benavent Cervera, Julio Fernández Garrido, Raúl Juárez Vela, Ana Pellín Carcelén, Óscar García-Algar, Vicente Gea Caballero, Vicente Andreu-Fernández
Forecasting SARS-CoV-2 outbreak through wastewater analysis: a success in wastewater-based epidemiology
● Virus detection in wastewater is a valuable tool for anticipating outbreaks. | |
● Feature engineering has proven valuable for developing predictive models. | |
● LightGBM models robustly generalize predictions across an entire region. | |
● Explainable ML frameworks are crucial for confidence in model predictions. | |
● WBE is a valuable tool that can help Public Health authorities in decision-making. |
The COVID-19 pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), triggered a global emergency that exposed the urgent need for surveillance approaches to monitor the dynamics of viral transmission. Several epidemiological tools that may help anticipate outbreaks have been developed. Wastewater-based epidemiology is a non-invasive and population-wide methodology for tracking the epidemiological evolution of the virus. However, thorough evaluation and understanding of the limitations, robustness, and intricacies of wastewater-based epidemiology are still pending to effectively use this strategy. The aim of this study was to train highly accurate predictive models using SARS-CoV-2 virus concentrations in wastewater in a region consisting of several municipalities. The chosen region was Catalonia (Spain) given the availability of wastewater SARS-CoV-2 quantification from the Catalan surveillance network and healthcare data (clinical cases) from the regional government. By using various feature engineering and machine learning methods, we developed a model that can accurately predict and successfully generalize across the municipalities that make up Catalonia. Explainable Machine Learning frameworks were also used, which allowed us to understand the factors that influence decision-making. Our findings support wastewater-based epidemiology as a potential surveillance tool to assist public health authorities in anticipating and monitoring outbreaks.
SARS-CoV-2 / Wastewater based epidemiology / Surveillance / Machine learning / Predictive models / Model explainability
[1] |
Adamidi E S, Mitsis K, Nikita K S. (2021). Artificial intelligence in clinical care amidst COVID-19 pandemic: a systematic review. Computational and Structural Biotechnology Journal, 19: 2833–2850
CrossRef
Google scholar
|
[2] |
Ai Y, He F, Lancaster E, Lee J. (2022). Application of machine learning for multi-community COVID-19 outbreak predictions with wastewater surveillance. PLoS One, 17(11): e0277154
CrossRef
Google scholar
|
[3] |
Arabzadeh R, Grünbacher D M, Insam H, Kreuzinger N, Markt R, Rauch W. (2021). Data filtering methods for SARS-CoV-2 wastewater surveillance. Water Science and Technology, 84(6): 1324–1339
CrossRef
Google scholar
|
[4] |
Atkinson A C, Riani M, Corbellini A. (2021). The Box–Cox transformation: review and extensions. Statistical Science, 36(2): 239–255
CrossRef
Google scholar
|
[5] |
Blum L, Elgendi M, Menon C. (2022). Impact of Box-Cox transformation on machine-learning algorithms. Frontiers in Artificial Intelligence, 5: 877569
CrossRef
Google scholar
|
[6] |
Booth A L, Abels E, Mccaffrey P. (2021). Development of a prognostic model for mortality in COVID-19 infection using machine learning. Modern Pathology, 34(3): 522–531
CrossRef
Google scholar
|
[7] |
Chadaga K, Prabhu S, Vivekananda B K, Niranjana S, Umakanth S. (2021). Battling COVID-19 using machine learning: a review. Cogent Engineering, 8(1): 1958666
CrossRef
Google scholar
|
[8] |
Chen H, Chen Z, Hu L, Tang F, Kuang D, Han J, Wang Y, Zhang X, Cheng Y, Meng J.
CrossRef
Google scholar
|
[9] |
Clement T, Kemmerzell N, Abdelaal M, Amberg M. (2023). XAIR: A systematic metareview of explainable AI (XAI) Aligned to the software development process. Machine Learning and Knowledge Extraction, 5(1): 78–108
CrossRef
Google scholar
|
[10] |
Daza-TorresM LMontesinos-LópezJ CKimMOlsonR BessC WRueda LSusaMTuckerLGarcía Y ESchmidtA J, et al. (2023). Model training periods impact estimation of COVID-19 incidence from wastewater viral loads. Science of the Total Environment, 858(Pt 1): 159680
|
[11] |
Generalitatde Catalunya (2023). Register of COVID-19 tests performed in Catalonia. Catalunya: Generalitat de Catalunya
|
[12] |
GregovicM, Filipovic L, KatnicI, VukoticM, Popovic T (2023). Machine learning models for statistical analysis. The International Arab Journal of Information Technology, 20 (Special Issue 3A): 505–514
|
[13] |
Guerrero-Latorre L, Collado N, Abasolo N, Anzaldi G, Bofill-Mas S, Bosch A, Bosch L, Busquets S, Caimari A, Canela N.
CrossRef
Google scholar
|
[14] |
Hill D T, Alazawi M A, Moran E J, Bennett L J, Bradley I, Collins M B, Gobler C J, Green H, Insaf T Z, Kmush B.
CrossRef
Google scholar
|
[15] |
InstitutoNacional de Estadística (2023). Population by Municipality. Madrid: Instituto Nacional de Estadística
|
[16] |
IslamS, Islam T, IslamM R (2022). New coronavirus variants are creating more challenges to global healthcare system: a brief report on the current knowledge. Clinical Pathology, 15: 2632010X221075584
|
[17] |
Jeng H A, Singh R, Diawara N, Curtis K, Gonzalez R, Welch N, Jackson C, Jurgens D, Adikari S. (2023). Application of wastewater-based surveillance and copula time-series model for COVID-19 forecasts. Science of the Total Environment, 885: 163655
CrossRef
Google scholar
|
[18] |
Joseph-Duran B, Serra-Compte A, Sàrrias M, Gonzalez S, López D, Prats C, Català M, Alvarez-Lacalle E, Alonso S, Arnaldos M. (2022). Assessing wastewater-based epidemiology for the prediction of SARS-CoV-2 incidence in Catalonia. Scientific Reports, 12(1): 15073
CrossRef
Google scholar
|
[19] |
Karabayir I, Goldman S, Pappu S, Akbilgic O. (2020). Gradient boosting for Parkinson’s disease diagnosis from voice recordings. BMC Medical Informatics and Decision Making, 20(1): 228
CrossRef
Google scholar
|
[20] |
KeG, MengQ, FinleyT, Wang T, ChenW, MaW, YeQ, LiuT Y (2017). LightGBM: a Highly Efficient Gradient Boosting Decision Tree. Long Beach: Curran Associates Inc.
|
[21] |
Kumar M, Joshi M, Patel A K, Joshi C G. (2021). Unravelling the early warning capability of wastewater surveillance for COVID-19: a temporal study on SARS-CoV-2 RNA detection and need for the escalation. Environmental Research, 196: 110946
CrossRef
Google scholar
|
[22] |
Lalmuanawma S, Hussain J L C. (2020). Applications of machine learning and artificial intelligence for COVID-19 (SARS-CoV-2) pandemic: a review. Chaos, Solitons & Fractals, 139: 110059
CrossRef
Google scholar
|
[23] |
Li K, Yao S, Zhang Z, Cao B, Wilson C M, Kalos D, Kuan P F, Zhu R, Wang X. (2022). Efficient gradient boosting for prognostic biomarker discovery. Bioinformatics, 38(6): 1631–1638
CrossRef
Google scholar
|
[24] |
Liao X, Liu X, He Y, Tang X, Xia R, Huang Y, Li W, Zou J, Zhou Z, Zhuang M. (2024). Alternate disinfection approaches or raise disinfectant dosages for sewage treatment plants to address the COVID-19 pandemic? From disinfection efficiency, DBP formation, and toxicity perspectives. Frontiers of Environmental Science & Engineering, 18(9): 115
CrossRef
Google scholar
|
[25] |
López-Peñalver R S, Cañas-Cañas R, Casaña-Mohedo J, Benavent-Cervera J V, Fernández-Garrido J, Juárez-Vela R, Pellín-Carcelén A, Gea-Caballero V, Andreu-Fernández V. (2023). Predictive potential of SARS-CoV-2 RNA concentration in wastewater to assess the dynamics of COVID-19 clinical outcomes and infections. Science of the Total Environment, 886: 163935
CrossRef
Google scholar
|
[26] |
Lu X, Wang L, Sakthivel S K, Whitaker B, Murray J, Kamili S, Lynch B, Malapati L, Burke S A, Harcourt J, Tamin A, Thornburg N J, Villanueva J M, Lindstrom S. (2020). US CDC real-time reverse transcription PCR panel for detection of severe acute respiratory syndrome Coronavirus 2. Emerging Infectious Diseases, 26(8): 1654–1665
CrossRef
Google scholar
|
[27] |
LundbergS M, Lee S I (2017). A unified approach to interpreting model predictions. Long Beach: Curran Associates Inc., 4768–4777
|
[28] |
Marimuthu S, Mani T, Sudarsanam T D, George S, Jeyaseelan L. (2022). Preferring Box-Cox transformation, instead of log transformation to convert skewed distribution of outcomes to normal in medical research. Clinical Epidemiology and Global Health, 15: 101043
CrossRef
Google scholar
|
[29] |
Pirzada R H, Ahmad B, Qayyum N, Choi S. (2023). Modeling structure–activity relationships with machine learning to identify GSK3-targeted small molecules as potential COVID-19 therapeutics. Frontiers in Endocrinology, 14: 1084327
CrossRef
Google scholar
|
[30] |
RCore Team (2024). R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing
|
[31] |
Randazzo W, Cuevas-Ferrando E, Sanjuán R, Domingo-Calap P, Sánchez G. (2020). Metropolitan wastewater analysis for COVID-19 epidemiological surveillance. International Journal of Hygiene and Environmental Health, 230: 113621
CrossRef
Google scholar
|
[32] |
Randazzo W, Piqueras J, Evtoski Z, Sastre G, Sancho R, Gonzalez C, Sánchez G. (2019). Interlaboratory comparative study to detect potentially infectious human enteric viruses in influent and effluent waters. Food and Environmental Virology, 11(4): 350–363
CrossRef
Google scholar
|
[33] |
Santangelo O E, Gentile V, Pizzo S, Giordano D, Cedrone F. (2023). Machine learning and prediction of infectious diseases: a systematic review. Machine Learning and Knowledge Extraction, 5(1): 175–198
CrossRef
Google scholar
|
[34] |
Sarker R, Roknuzzaman A S M, Nazmunnahar M, Shahriar M J, Hossain M R. (2023). The WHO has declared the end of pandemic phase of COVID‐19: way to come back in the normal life. Health Science Reports, 6(9): e1544
CrossRef
Google scholar
|
[35] |
Schneider K A, Tsoungui Obama H C J, Adil Mahmoud Yousif N. (2023). A flexible age-dependent, spatially-stratified predictive model for the spread of COVID-19, accounting for multiple viral variants and vaccines. PLoS One, 18(1): e0277505
CrossRef
Google scholar
|
[36] |
Shang M, Kong Y, Yang Z, Cheng R, Zheng X, Liu Y, Chen T. (2023). Removal of virus aerosols by the combination of filtration and UV-C irradiation. Frontiers of Environmental Science & Engineering, 17(3): 27
CrossRef
Google scholar
|
[37] |
ShapleyL S (1952). A Value for n-Persons Games. Santa Monica: The Rand Corporation
|
[38] |
Silva J A. (2023). Wastewater treatment and reuse for sustainable water resources management: a systematic literature review. Sustainability, 15(14): 10940
CrossRef
Google scholar
|
[39] |
Tiwari A, Adhikari S, Kaya D, Islam M A, Malla B, Sherchan S P, Al-Mustapha A I, Kumar M, Aggarwal S, Bhattacharya P.
CrossRef
Google scholar
|
[40] |
Vallejo J A, Trigo-Tasende N, Rumbo-Feal S, Conde-Pérez K, López-Oriona A, Barbeito I, Vaamonde M, Tarrío-Saavedra J, Reif R, Ladra S.
CrossRef
Google scholar
|
[41] |
van RossumG (1995). Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica
|
[42] |
Vandenberg O, Martiny D, Rochas O, Van Belkum A, Kozlakidis Z. (2021). Considerations for diagnostic COVID-19 tests. Nature Reviews. Microbiology, 19(3): 171–183
CrossRef
Google scholar
|
[43] |
Weinan E. (2020). Machine learning and computational mathematics. Communications in Computational Physics, 28(5): 1639–1670
CrossRef
Google scholar
|
[44] |
Wickham H, Averick M, Bryan J, Chang W, Mcgowan L, François R, Grolemund G, Hayes A, Henry L, Hester J.
CrossRef
Google scholar
|
[45] |
Zheng X, Zhao K, Xu X, Deng Y, Leung K, Wu J T, Leung G M, Peiris M, Poon L L M, Zhang T. (2023). Development and application of influenza virus wastewater surveillance in Hong Kong. Water Research, 245: 120594
CrossRef
Google scholar
|
[46] |
Zhu Y, Oishi W, Maruo C, Bandara S, Lin M, Saito M, Kitajima M, Sano D. (2022). COVID-19 case prediction via wastewater surveillance in a low-prevalence urban community: a modeling approach. Journal of Water and Health, 20(2): 459–470
CrossRef
Google scholar
|
[47] |
Zoran M A, Savastru R S, Savastru D M, Tautan M N, Baschir L A, Tenciu D. (2022). Assessing the impact of air pollution and climate seasonality on COVID-19 multiwaves in Madrid, Spain. Environmental Research, 203: 111849
CrossRef
Google scholar
|
/
〈 | 〉 |