Forecasting SARS-CoV-2 outbreak through wastewater analysis: a success in wastewater-based epidemiology

Rubén Cañas Cañas , Raimundo Seguí López-Peñalver , Jorge Casaña Mohedo , José Vicente Benavent Cervera , Julio Fernández Garrido , Raúl Juárez Vela , Ana Pellín Carcelén , Óscar García-Algar , Vicente Gea Caballero , Vicente Andreu-Fernández

Front. Environ. Sci. Eng. ›› 2025, Vol. 19 ›› Issue (1) : 12

PDF (4351KB)
Front. Environ. Sci. Eng. ›› 2025, Vol. 19 ›› Issue (1) : 12 DOI: 10.1007/s11783-025-1932-8
RESEARCH ARTICLE

Forecasting SARS-CoV-2 outbreak through wastewater analysis: a success in wastewater-based epidemiology

Author information +
History +
PDF (4351KB)

Abstract

The COVID-19 pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), triggered a global emergency that exposed the urgent need for surveillance approaches to monitor the dynamics of viral transmission. Several epidemiological tools that may help anticipate outbreaks have been developed. Wastewater-based epidemiology is a non-invasive and population-wide methodology for tracking the epidemiological evolution of the virus. However, thorough evaluation and understanding of the limitations, robustness, and intricacies of wastewater-based epidemiology are still pending to effectively use this strategy. The aim of this study was to train highly accurate predictive models using SARS-CoV-2 virus concentrations in wastewater in a region consisting of several municipalities. The chosen region was Catalonia (Spain) given the availability of wastewater SARS-CoV-2 quantification from the Catalan surveillance network and healthcare data (clinical cases) from the regional government. By using various feature engineering and machine learning methods, we developed a model that can accurately predict and successfully generalize across the municipalities that make up Catalonia. Explainable Machine Learning frameworks were also used, which allowed us to understand the factors that influence decision-making. Our findings support wastewater-based epidemiology as a potential surveillance tool to assist public health authorities in anticipating and monitoring outbreaks.

Graphical abstract

Keywords

SARS-CoV-2 / Wastewater based epidemiology / Surveillance / Machine learning / Predictive models / Model explainability

Highlight

● Virus detection in wastewater is a valuable tool for anticipating outbreaks.

● Feature engineering has proven valuable for developing predictive models.

● LightGBM models robustly generalize predictions across an entire region.

● Explainable ML frameworks are crucial for confidence in model predictions.

● WBE is a valuable tool that can help Public Health authorities in decision-making.

Cite this article

Download citation ▾
Rubén Cañas Cañas, Raimundo Seguí López-Peñalver, Jorge Casaña Mohedo, José Vicente Benavent Cervera, Julio Fernández Garrido, Raúl Juárez Vela, Ana Pellín Carcelén, Óscar García-Algar, Vicente Gea Caballero, Vicente Andreu-Fernández. Forecasting SARS-CoV-2 outbreak through wastewater analysis: a success in wastewater-based epidemiology. Front. Environ. Sci. Eng., 2025, 19(1): 12 DOI:10.1007/s11783-025-1932-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Adamidi E S, Mitsis K, Nikita K S. (2021). Artificial intelligence in clinical care amidst COVID-19 pandemic: a systematic review. Computational and Structural Biotechnology Journal, 19: 2833–2850

[2]

Ai Y, He F, Lancaster E, Lee J. (2022). Application of machine learning for multi-community COVID-19 outbreak predictions with wastewater surveillance. PLoS One, 17(11): e0277154

[3]

Arabzadeh R, Grünbacher D M, Insam H, Kreuzinger N, Markt R, Rauch W. (2021). Data filtering methods for SARS-CoV-2 wastewater surveillance. Water Science and Technology, 84(6): 1324–1339

[4]

Atkinson A C, Riani M, Corbellini A. (2021). The Box–Cox transformation: review and extensions. Statistical Science, 36(2): 239–255

[5]

Blum L, Elgendi M, Menon C. (2022). Impact of Box-Cox transformation on machine-learning algorithms. Frontiers in Artificial Intelligence, 5: 877569

[6]

Booth A L, Abels E, Mccaffrey P. (2021). Development of a prognostic model for mortality in COVID-19 infection using machine learning. Modern Pathology, 34(3): 522–531

[7]

Chadaga K, Prabhu S, Vivekananda B K, Niranjana S, Umakanth S. (2021). Battling COVID-19 using machine learning: a review. Cogent Engineering, 8(1): 1958666

[8]

Chen H, Chen Z, Hu L, Tang F, Kuang D, Han J, Wang Y, Zhang X, Cheng Y, Meng J. . (2024). Application of wastewater-based epidemiological monitoring of COVID-19 for disease surveillance in the city. Frontiers of Environmental Science & Engineering, 18(8): 98

[9]

Clement T, Kemmerzell N, Abdelaal M, Amberg M. (2023). XAIR: A systematic metareview of explainable AI (XAI) Aligned to the software development process. Machine Learning and Knowledge Extraction, 5(1): 78–108

[10]

Daza-TorresM LMontesinos-LópezJ CKimMOlsonR BessC WRueda LSusaMTuckerLGarcía Y ESchmidtA J, et al. (2023). Model training periods impact estimation of COVID-19 incidence from wastewater viral loads. Science of the Total Environment, 858(Pt 1): 159680

[11]

Generalitatde Catalunya (2023). Register of COVID-19 tests performed in Catalonia. Catalunya: Generalitat de Catalunya

[12]

GregovicM, Filipovic L, KatnicI, VukoticM, Popovic T (2023). Machine learning models for statistical analysis. The International Arab Journal of Information Technology, 20 (Special Issue 3A): 505–514

[13]

Guerrero-Latorre L, Collado N, Abasolo N, Anzaldi G, Bofill-Mas S, Bosch A, Bosch L, Busquets S, Caimari A, Canela N. . (2022). The Catalan surveillance network of SARS-CoV-2 in sewage: design, implementation, and performance. Scientific Reports, 12(1): 16704

[14]

Hill D T, Alazawi M A, Moran E J, Bennett L J, Bradley I, Collins M B, Gobler C J, Green H, Insaf T Z, Kmush B. . (2023). Wastewater surveillance provides 10-days forecasting of COVID-19 hospitalizations superior to cases and test positivity: a prediction study. Infectious Disease Modelling, 8(4): 1138–1150

[15]

InstitutoNacional de Estadística (2023). Population by Municipality. Madrid: Instituto Nacional de Estadística

[16]

IslamS, Islam T, IslamM R (2022). New coronavirus variants are creating more challenges to global healthcare system: a brief report on the current knowledge. Clinical Pathology, 15: 2632010X221075584

[17]

Jeng H A, Singh R, Diawara N, Curtis K, Gonzalez R, Welch N, Jackson C, Jurgens D, Adikari S. (2023). Application of wastewater-based surveillance and copula time-series model for COVID-19 forecasts. Science of the Total Environment, 885: 163655

[18]

Joseph-Duran B, Serra-Compte A, Sàrrias M, Gonzalez S, López D, Prats C, Català M, Alvarez-Lacalle E, Alonso S, Arnaldos M. (2022). Assessing wastewater-based epidemiology for the prediction of SARS-CoV-2 incidence in Catalonia. Scientific Reports, 12(1): 15073

[19]

Karabayir I, Goldman S, Pappu S, Akbilgic O. (2020). Gradient boosting for Parkinson’s disease diagnosis from voice recordings. BMC Medical Informatics and Decision Making, 20(1): 228

[20]

KeG, MengQ, FinleyT, Wang T, ChenW, MaW, YeQ, LiuT Y (2017). LightGBM: a Highly Efficient Gradient Boosting Decision Tree. Long Beach: Curran Associates Inc.

[21]

Kumar M, Joshi M, Patel A K, Joshi C G. (2021). Unravelling the early warning capability of wastewater surveillance for COVID-19: a temporal study on SARS-CoV-2 RNA detection and need for the escalation. Environmental Research, 196: 110946

[22]

Lalmuanawma S, Hussain J L C. (2020). Applications of machine learning and artificial intelligence for COVID-19 (SARS-CoV-2) pandemic: a review. Chaos, Solitons & Fractals, 139: 110059

[23]

Li K, Yao S, Zhang Z, Cao B, Wilson C M, Kalos D, Kuan P F, Zhu R, Wang X. (2022). Efficient gradient boosting for prognostic biomarker discovery. Bioinformatics, 38(6): 1631–1638

[24]

Liao X, Liu X, He Y, Tang X, Xia R, Huang Y, Li W, Zou J, Zhou Z, Zhuang M. (2024). Alternate disinfection approaches or raise disinfectant dosages for sewage treatment plants to address the COVID-19 pandemic? From disinfection efficiency, DBP formation, and toxicity perspectives. Frontiers of Environmental Science & Engineering, 18(9): 115

[25]

López-Peñalver R S, Cañas-Cañas R, Casaña-Mohedo J, Benavent-Cervera J V, Fernández-Garrido J, Juárez-Vela R, Pellín-Carcelén A, Gea-Caballero V, Andreu-Fernández V. (2023). Predictive potential of SARS-CoV-2 RNA concentration in wastewater to assess the dynamics of COVID-19 clinical outcomes and infections. Science of the Total Environment, 886: 163935

[26]

Lu X, Wang L, Sakthivel S K, Whitaker B, Murray J, Kamili S, Lynch B, Malapati L, Burke S A, Harcourt J, Tamin A, Thornburg N J, Villanueva J M, Lindstrom S. (2020). US CDC real-time reverse transcription PCR panel for detection of severe acute respiratory syndrome Coronavirus 2. Emerging Infectious Diseases, 26(8): 1654–1665

[27]

LundbergS M, Lee S I (2017). A unified approach to interpreting model predictions. Long Beach: Curran Associates Inc., 4768–4777

[28]

Marimuthu S, Mani T, Sudarsanam T D, George S, Jeyaseelan L. (2022). Preferring Box-Cox transformation, instead of log transformation to convert skewed distribution of outcomes to normal in medical research. Clinical Epidemiology and Global Health, 15: 101043

[29]

Pirzada R H, Ahmad B, Qayyum N, Choi S. (2023). Modeling structure–activity relationships with machine learning to identify GSK3-targeted small molecules as potential COVID-19 therapeutics. Frontiers in Endocrinology, 14: 1084327

[30]

RCore Team (2024). R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing

[31]

Randazzo W, Cuevas-Ferrando E, Sanjuán R, Domingo-Calap P, Sánchez G. (2020). Metropolitan wastewater analysis for COVID-19 epidemiological surveillance. International Journal of Hygiene and Environmental Health, 230: 113621

[32]

Randazzo W, Piqueras J, Evtoski Z, Sastre G, Sancho R, Gonzalez C, Sánchez G. (2019). Interlaboratory comparative study to detect potentially infectious human enteric viruses in influent and effluent waters. Food and Environmental Virology, 11(4): 350–363

[33]

Santangelo O E, Gentile V, Pizzo S, Giordano D, Cedrone F. (2023). Machine learning and prediction of infectious diseases: a systematic review. Machine Learning and Knowledge Extraction, 5(1): 175–198

[34]

Sarker R, Roknuzzaman A S M, Nazmunnahar M, Shahriar M J, Hossain M R. (2023). The WHO has declared the end of pandemic phase of COVID‐19: way to come back in the normal life. Health Science Reports, 6(9): e1544

[35]

Schneider K A, Tsoungui Obama H C J, Adil Mahmoud Yousif N. (2023). A flexible age-dependent, spatially-stratified predictive model for the spread of COVID-19, accounting for multiple viral variants and vaccines. PLoS One, 18(1): e0277505

[36]

Shang M, Kong Y, Yang Z, Cheng R, Zheng X, Liu Y, Chen T. (2023). Removal of virus aerosols by the combination of filtration and UV-C irradiation. Frontiers of Environmental Science & Engineering, 17(3): 27

[37]

ShapleyL S (1952). A Value for n-Persons Games. Santa Monica: The Rand Corporation

[38]

Silva J A. (2023). Wastewater treatment and reuse for sustainable water resources management: a systematic literature review. Sustainability, 15(14): 10940

[39]

Tiwari A, Adhikari S, Kaya D, Islam M A, Malla B, Sherchan S P, Al-Mustapha A I, Kumar M, Aggarwal S, Bhattacharya P. . (2023). Monkeypox outbreak: wastewater and environmental surveillance perspective. Science of the Total Environment, 856: 159166

[40]

Vallejo J A, Trigo-Tasende N, Rumbo-Feal S, Conde-Pérez K, López-Oriona A, Barbeito I, Vaamonde M, Tarrío-Saavedra J, Reif R, Ladra S. . (2022). Modeling the number of people infected with SARS-CoV-2 from wastewater viral load in Northwest Spain. Science of the Total Environment, 811: 152334

[41]

van RossumG (1995). Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica

[42]

Vandenberg O, Martiny D, Rochas O, Van Belkum A, Kozlakidis Z. (2021). Considerations for diagnostic COVID-19 tests. Nature Reviews. Microbiology, 19(3): 171–183

[43]

Weinan E. (2020). Machine learning and computational mathematics. Communications in Computational Physics, 28(5): 1639–1670

[44]

Wickham H, Averick M, Bryan J, Chang W, Mcgowan L, François R, Grolemund G, Hayes A, Henry L, Hester J. . (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43): 1686

[45]

Zheng X, Zhao K, Xu X, Deng Y, Leung K, Wu J T, Leung G M, Peiris M, Poon L L M, Zhang T. (2023). Development and application of influenza virus wastewater surveillance in Hong Kong. Water Research, 245: 120594

[46]

Zhu Y, Oishi W, Maruo C, Bandara S, Lin M, Saito M, Kitajima M, Sano D. (2022). COVID-19 case prediction via wastewater surveillance in a low-prevalence urban community: a modeling approach. Journal of Water and Health, 20(2): 459–470

[47]

Zoran M A, Savastru R S, Savastru D M, Tautan M N, Baschir L A, Tenciu D. (2022). Assessing the impact of air pollution and climate seasonality on COVID-19 multiwaves in Madrid, Spain. Environmental Research, 203: 111849

RIGHTS & PERMISSIONS

The Author(s) 2025. This article is published with open access at link.springer.com and journal.hep.com.cn

AI Summary AI Mindmap
PDF (4351KB)

Supplementary files

FSE-24120-of-CRC_suppl_1

1205

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/