Forecasting SARS-CoV-2 outbreak through wastewater analysis: a success in wastewater-based epidemiology

Rubén Cañas Cañas, Raimundo Seguí López-Peñalver, Jorge Casaña Mohedo, José Vicente Benavent Cervera, Julio Fernández Garrido, Raúl Juárez Vela, Ana Pellín Carcelén, Óscar García-Algar, Vicente Gea Caballero, Vicente Andreu-Fernández

PDF(4351 KB)
PDF(4351 KB)
Front. Environ. Sci. Eng. ›› 2025, Vol. 19 ›› Issue (1) : 12. DOI: 10.1007/s11783-025-1932-8
RESEARCH ARTICLE

Forecasting SARS-CoV-2 outbreak through wastewater analysis: a success in wastewater-based epidemiology

Author information +
History +

Highlights

● Virus detection in wastewater is a valuable tool for anticipating outbreaks.

● Feature engineering has proven valuable for developing predictive models.

● LightGBM models robustly generalize predictions across an entire region.

● Explainable ML frameworks are crucial for confidence in model predictions.

● WBE is a valuable tool that can help Public Health authorities in decision-making.

Abstract

The COVID-19 pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), triggered a global emergency that exposed the urgent need for surveillance approaches to monitor the dynamics of viral transmission. Several epidemiological tools that may help anticipate outbreaks have been developed. Wastewater-based epidemiology is a non-invasive and population-wide methodology for tracking the epidemiological evolution of the virus. However, thorough evaluation and understanding of the limitations, robustness, and intricacies of wastewater-based epidemiology are still pending to effectively use this strategy. The aim of this study was to train highly accurate predictive models using SARS-CoV-2 virus concentrations in wastewater in a region consisting of several municipalities. The chosen region was Catalonia (Spain) given the availability of wastewater SARS-CoV-2 quantification from the Catalan surveillance network and healthcare data (clinical cases) from the regional government. By using various feature engineering and machine learning methods, we developed a model that can accurately predict and successfully generalize across the municipalities that make up Catalonia. Explainable Machine Learning frameworks were also used, which allowed us to understand the factors that influence decision-making. Our findings support wastewater-based epidemiology as a potential surveillance tool to assist public health authorities in anticipating and monitoring outbreaks.

Graphical abstract

Keywords

SARS-CoV-2 / Wastewater based epidemiology / Surveillance / Machine learning / Predictive models / Model explainability

Cite this article

Download citation ▾
Rubén Cañas Cañas, Raimundo Seguí López-Peñalver, Jorge Casaña Mohedo, José Vicente Benavent Cervera, Julio Fernández Garrido, Raúl Juárez Vela, Ana Pellín Carcelén, Óscar García-Algar, Vicente Gea Caballero, Vicente Andreu-Fernández. Forecasting SARS-CoV-2 outbreak through wastewater analysis: a success in wastewater-based epidemiology. Front. Environ. Sci. Eng., 2025, 19(1): 12 https://doi.org/10.1007/s11783-025-1932-8

References

[1]
Adamidi E S, Mitsis K, Nikita K S. (2021). Artificial intelligence in clinical care amidst COVID-19 pandemic: a systematic review. Computational and Structural Biotechnology Journal, 19: 2833–2850
CrossRef Google scholar
[2]
Ai Y, He F, Lancaster E, Lee J. (2022). Application of machine learning for multi-community COVID-19 outbreak predictions with wastewater surveillance. PLoS One, 17(11): e0277154
CrossRef Google scholar
[3]
Arabzadeh R, Grünbacher D M, Insam H, Kreuzinger N, Markt R, Rauch W. (2021). Data filtering methods for SARS-CoV-2 wastewater surveillance. Water Science and Technology, 84(6): 1324–1339
CrossRef Google scholar
[4]
Atkinson A C, Riani M, Corbellini A. (2021). The Box–Cox transformation: review and extensions. Statistical Science, 36(2): 239–255
CrossRef Google scholar
[5]
Blum L, Elgendi M, Menon C. (2022). Impact of Box-Cox transformation on machine-learning algorithms. Frontiers in Artificial Intelligence, 5: 877569
CrossRef Google scholar
[6]
Booth A L, Abels E, Mccaffrey P. (2021). Development of a prognostic model for mortality in COVID-19 infection using machine learning. Modern Pathology, 34(3): 522–531
CrossRef Google scholar
[7]
Chadaga K, Prabhu S, Vivekananda B K, Niranjana S, Umakanth S. (2021). Battling COVID-19 using machine learning: a review. Cogent Engineering, 8(1): 1958666
CrossRef Google scholar
[8]
Chen H, Chen Z, Hu L, Tang F, Kuang D, Han J, Wang Y, Zhang X, Cheng Y, Meng J. . (2024). Application of wastewater-based epidemiological monitoring of COVID-19 for disease surveillance in the city. Frontiers of Environmental Science & Engineering, 18(8): 98
CrossRef Google scholar
[9]
Clement T, Kemmerzell N, Abdelaal M, Amberg M. (2023). XAIR: A systematic metareview of explainable AI (XAI) Aligned to the software development process. Machine Learning and Knowledge Extraction, 5(1): 78–108
CrossRef Google scholar
[10]
Daza-TorresM LMontesinos-LópezJ CKimMOlsonR BessC WRueda LSusaMTuckerLGarcía Y ESchmidtA J, et al. (2023). Model training periods impact estimation of COVID-19 incidence from wastewater viral loads. Science of the Total Environment, 858(Pt 1): 159680
[11]
Generalitatde Catalunya (2023). Register of COVID-19 tests performed in Catalonia. Catalunya: Generalitat de Catalunya
[12]
GregovicM, Filipovic L, KatnicI, VukoticM, Popovic T (2023). Machine learning models for statistical analysis. The International Arab Journal of Information Technology, 20 (Special Issue 3A): 505–514
[13]
Guerrero-Latorre L, Collado N, Abasolo N, Anzaldi G, Bofill-Mas S, Bosch A, Bosch L, Busquets S, Caimari A, Canela N. . (2022). The Catalan surveillance network of SARS-CoV-2 in sewage: design, implementation, and performance. Scientific Reports, 12(1): 16704
CrossRef Google scholar
[14]
Hill D T, Alazawi M A, Moran E J, Bennett L J, Bradley I, Collins M B, Gobler C J, Green H, Insaf T Z, Kmush B. . (2023). Wastewater surveillance provides 10-days forecasting of COVID-19 hospitalizations superior to cases and test positivity: a prediction study. Infectious Disease Modelling, 8(4): 1138–1150
CrossRef Google scholar
[15]
InstitutoNacional de Estadística (2023). Population by Municipality. Madrid: Instituto Nacional de Estadística
[16]
IslamS, Islam T, IslamM R (2022). New coronavirus variants are creating more challenges to global healthcare system: a brief report on the current knowledge. Clinical Pathology, 15: 2632010X221075584
[17]
Jeng H A, Singh R, Diawara N, Curtis K, Gonzalez R, Welch N, Jackson C, Jurgens D, Adikari S. (2023). Application of wastewater-based surveillance and copula time-series model for COVID-19 forecasts. Science of the Total Environment, 885: 163655
CrossRef Google scholar
[18]
Joseph-Duran B, Serra-Compte A, Sàrrias M, Gonzalez S, López D, Prats C, Català M, Alvarez-Lacalle E, Alonso S, Arnaldos M. (2022). Assessing wastewater-based epidemiology for the prediction of SARS-CoV-2 incidence in Catalonia. Scientific Reports, 12(1): 15073
CrossRef Google scholar
[19]
Karabayir I, Goldman S, Pappu S, Akbilgic O. (2020). Gradient boosting for Parkinson’s disease diagnosis from voice recordings. BMC Medical Informatics and Decision Making, 20(1): 228
CrossRef Google scholar
[20]
KeG, MengQ, FinleyT, Wang T, ChenW, MaW, YeQ, LiuT Y (2017). LightGBM: a Highly Efficient Gradient Boosting Decision Tree. Long Beach: Curran Associates Inc.
[21]
Kumar M, Joshi M, Patel A K, Joshi C G. (2021). Unravelling the early warning capability of wastewater surveillance for COVID-19: a temporal study on SARS-CoV-2 RNA detection and need for the escalation. Environmental Research, 196: 110946
CrossRef Google scholar
[22]
Lalmuanawma S, Hussain J L C. (2020). Applications of machine learning and artificial intelligence for COVID-19 (SARS-CoV-2) pandemic: a review. Chaos, Solitons & Fractals, 139: 110059
CrossRef Google scholar
[23]
Li K, Yao S, Zhang Z, Cao B, Wilson C M, Kalos D, Kuan P F, Zhu R, Wang X. (2022). Efficient gradient boosting for prognostic biomarker discovery. Bioinformatics, 38(6): 1631–1638
CrossRef Google scholar
[24]
Liao X, Liu X, He Y, Tang X, Xia R, Huang Y, Li W, Zou J, Zhou Z, Zhuang M. (2024). Alternate disinfection approaches or raise disinfectant dosages for sewage treatment plants to address the COVID-19 pandemic? From disinfection efficiency, DBP formation, and toxicity perspectives. Frontiers of Environmental Science & Engineering, 18(9): 115
CrossRef Google scholar
[25]
López-Peñalver R S, Cañas-Cañas R, Casaña-Mohedo J, Benavent-Cervera J V, Fernández-Garrido J, Juárez-Vela R, Pellín-Carcelén A, Gea-Caballero V, Andreu-Fernández V. (2023). Predictive potential of SARS-CoV-2 RNA concentration in wastewater to assess the dynamics of COVID-19 clinical outcomes and infections. Science of the Total Environment, 886: 163935
CrossRef Google scholar
[26]
Lu X, Wang L, Sakthivel S K, Whitaker B, Murray J, Kamili S, Lynch B, Malapati L, Burke S A, Harcourt J, Tamin A, Thornburg N J, Villanueva J M, Lindstrom S. (2020). US CDC real-time reverse transcription PCR panel for detection of severe acute respiratory syndrome Coronavirus 2. Emerging Infectious Diseases, 26(8): 1654–1665
CrossRef Google scholar
[27]
LundbergS M, Lee S I (2017). A unified approach to interpreting model predictions. Long Beach: Curran Associates Inc., 4768–4777
[28]
Marimuthu S, Mani T, Sudarsanam T D, George S, Jeyaseelan L. (2022). Preferring Box-Cox transformation, instead of log transformation to convert skewed distribution of outcomes to normal in medical research. Clinical Epidemiology and Global Health, 15: 101043
CrossRef Google scholar
[29]
Pirzada R H, Ahmad B, Qayyum N, Choi S. (2023). Modeling structure–activity relationships with machine learning to identify GSK3-targeted small molecules as potential COVID-19 therapeutics. Frontiers in Endocrinology, 14: 1084327
CrossRef Google scholar
[30]
RCore Team (2024). R: a Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing
[31]
Randazzo W, Cuevas-Ferrando E, Sanjuán R, Domingo-Calap P, Sánchez G. (2020). Metropolitan wastewater analysis for COVID-19 epidemiological surveillance. International Journal of Hygiene and Environmental Health, 230: 113621
CrossRef Google scholar
[32]
Randazzo W, Piqueras J, Evtoski Z, Sastre G, Sancho R, Gonzalez C, Sánchez G. (2019). Interlaboratory comparative study to detect potentially infectious human enteric viruses in influent and effluent waters. Food and Environmental Virology, 11(4): 350–363
CrossRef Google scholar
[33]
Santangelo O E, Gentile V, Pizzo S, Giordano D, Cedrone F. (2023). Machine learning and prediction of infectious diseases: a systematic review. Machine Learning and Knowledge Extraction, 5(1): 175–198
CrossRef Google scholar
[34]
Sarker R, Roknuzzaman A S M, Nazmunnahar M, Shahriar M J, Hossain M R. (2023). The WHO has declared the end of pandemic phase of COVID‐19: way to come back in the normal life. Health Science Reports, 6(9): e1544
CrossRef Google scholar
[35]
Schneider K A, Tsoungui Obama H C J, Adil Mahmoud Yousif N. (2023). A flexible age-dependent, spatially-stratified predictive model for the spread of COVID-19, accounting for multiple viral variants and vaccines. PLoS One, 18(1): e0277505
CrossRef Google scholar
[36]
Shang M, Kong Y, Yang Z, Cheng R, Zheng X, Liu Y, Chen T. (2023). Removal of virus aerosols by the combination of filtration and UV-C irradiation. Frontiers of Environmental Science & Engineering, 17(3): 27
CrossRef Google scholar
[37]
ShapleyL S (1952). A Value for n-Persons Games. Santa Monica: The Rand Corporation
[38]
Silva J A. (2023). Wastewater treatment and reuse for sustainable water resources management: a systematic literature review. Sustainability, 15(14): 10940
CrossRef Google scholar
[39]
Tiwari A, Adhikari S, Kaya D, Islam M A, Malla B, Sherchan S P, Al-Mustapha A I, Kumar M, Aggarwal S, Bhattacharya P. . (2023). Monkeypox outbreak: wastewater and environmental surveillance perspective. Science of the Total Environment, 856: 159166
CrossRef Google scholar
[40]
Vallejo J A, Trigo-Tasende N, Rumbo-Feal S, Conde-Pérez K, López-Oriona A, Barbeito I, Vaamonde M, Tarrío-Saavedra J, Reif R, Ladra S. . (2022). Modeling the number of people infected with SARS-CoV-2 from wastewater viral load in Northwest Spain. Science of the Total Environment, 811: 152334
CrossRef Google scholar
[41]
van RossumG (1995). Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica
[42]
Vandenberg O, Martiny D, Rochas O, Van Belkum A, Kozlakidis Z. (2021). Considerations for diagnostic COVID-19 tests. Nature Reviews. Microbiology, 19(3): 171–183
CrossRef Google scholar
[43]
Weinan E. (2020). Machine learning and computational mathematics. Communications in Computational Physics, 28(5): 1639–1670
CrossRef Google scholar
[44]
Wickham H, Averick M, Bryan J, Chang W, Mcgowan L, François R, Grolemund G, Hayes A, Henry L, Hester J. . (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43): 1686
CrossRef Google scholar
[45]
Zheng X, Zhao K, Xu X, Deng Y, Leung K, Wu J T, Leung G M, Peiris M, Poon L L M, Zhang T. (2023). Development and application of influenza virus wastewater surveillance in Hong Kong. Water Research, 245: 120594
CrossRef Google scholar
[46]
Zhu Y, Oishi W, Maruo C, Bandara S, Lin M, Saito M, Kitajima M, Sano D. (2022). COVID-19 case prediction via wastewater surveillance in a low-prevalence urban community: a modeling approach. Journal of Water and Health, 20(2): 459–470
CrossRef Google scholar
[47]
Zoran M A, Savastru R S, Savastru D M, Tautan M N, Baschir L A, Tenciu D. (2022). Assessing the impact of air pollution and climate seasonality on COVID-19 multiwaves in Madrid, Spain. Environmental Research, 203: 111849
CrossRef Google scholar

CRediT Authorship Contribution Statement

Mr. Ruben Cañas Cañas: Methodology, Software, Formal analysis, Investigation, Data Curation, Writing - original draft, Writing - review and editing. Dr. Seguí López-Peñalver: Methodology, Formal analysis, Data Curation, Resources, Writing - original draft, Writing - review and editing. Dr. Casaña Mohedo: Methodology, Formal analysis, Visualization, Writing - original draft, Writing - review and editing. Dr. Benavent Cervera: Conceptualization, Investigation, Resources, Formal analysis, Writing - review and editing. Dr. Fernández Garrido: Conceptualization, Investigation, Resources, Formal analysis, Writing - review and editing. Dra. Pellín Carcelén: Methodology, Investigation, Resources, Writing - review and editing. Dr. García-Algar: Methodology, Investigation, Resources, Writing - review and editing. Dr. Juárez Vela: Methodology, Investigation, Resources, Writing - review and editing. Dr. Gea Caballero: Conceptualization, Investigation, Visualization, Validation, Supervision, Writing - review and editing, Formal analysis. Dr. Andreu-Fernández: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Funding acquisition, Supervision, Project administration, Writing - original draft, Writing - review and editing. All authors critically reviewed the manuscript and approved the final manuscript as submitted.

Acknowledgements

This research was funded by the Valencian International University and Generalitat Valenciana (GVA) through the Grants to emerging research groups 2023 (CE2023) from the Regional Ministry of Innovation, Universities, Science and Digital Society (CIGE/2022/58). We would like to thank the Catalan Surveillance Network of SARS-CoV-2 in Sewage and the government of Catalonia for generating high-quality data and making it publicly available for research. We also would like to thank the Biosanitary Research Institute of Valencian International University (VIU) and Global Omnium for their support in the development of this project.

Conflict of Interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Electronic Supplementary Material

Supplementary material is available in the online version of this article at https://doi.org/10.1007/s11783-025-1932-8 and is accessible for authorized users.

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2025 The Author(s) 2025. This article is published with open access at link.springer.com and journal.hep.com.cn
AI Summary AI Mindmap
PDF(4351 KB)

Accesses

Citations

Detail

Sections
Recommended

/