Functional data analysis: An application to COVID-19 data in the United States in 2020

Chen Tang, Tiandong Wang, Panpan Zhang

PDF(4648 KB)
PDF(4648 KB)
Quant. Biol. ›› 2022, Vol. 10 ›› Issue (2) : 172-187. DOI: 10.15302/J-QB-022-0300
RESEARCH ARTICLE
RESEARCH ARTICLE

Functional data analysis: An application to COVID-19 data in the United States in 2020

Author information +
History +

Abstract

Background: In this paper, we conduct an analysis of the COVID-19 data in the United States in 2020 via functional data analysis methods. Through this research, we investigate the effectiveness of the practice of public health measures, and assess the correlation between infections and deaths caused by the COVID-19. Additionally, we look into the relationship between COVID-19 spread and geographical locations, and propose a forecasting method to predict the total number of confirmed cases nationwide.

Methods: The functional data analysis methods include functional principal analysis methods, functional canonical correlation analysis methods, an expectation-maximization (EM) based clustering algorithm and a functional time series model used for forecasting.

Results: It is evident that the practice of public health measures helps to reduce the growth rate of the epidemic outbreak over the nation. We have observed a high canonical correlation between confirmed and death cases. States that are geographically close to the hot spots are likely to be clustered together, and population density appears to be a critical factor affecting the cluster structure. The proposed functional time series model gives more reliable and accurate predictions of the total number of confirmed cases than standard time series methods.

Conclusions: The results obtained by applying the functional data analysis methods provide new insights into the COVID-19 data in the United States. With our results and recommendations, the health professionals can make better decisions to reduce the spread of the epidemic, and mitigate its negative effects to the national public health.

Author summary

We study the COVID-19 time series data in the United States in 2020 via functional data analysis methods. We find that the practice of public health measures helps to reduce the spread of the epidemic, and there is a high canonical correlation between confirmed and death cases. The cluster structure at state level is closely related to the state geographical locations. Through a functional time series model, we are able to accurately predict the total number of infections in the nation relative to standard time series methods.

Graphical abstract

Keywords

COVID-19 / canonical correlation / cluster analysis / functional time series / forecasting / principal component analysis

Cite this article

Download citation ▾
Chen Tang, Tiandong Wang, Panpan Zhang. Functional data analysis: An application to COVID-19 data in the United States in 2020. Quant. Biol., 2022, 10(2): 172‒187 https://doi.org/10.15302/J-QB-022-0300

References

[1]
Dong,E., Du,H. ( 2020). An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis., 20 : 533– 534
CrossRef Google scholar
[2]
Grubaugh,N. D., Hanage,W. P. Rasmussen,A. ( 2020). Making sense of mutation: What D614G means for the COVID-19 pandemic remains unclear. Cell, 182 : 794– 795
CrossRef Google scholar
[3]
JollieI.. ( 2002) Principal Component Analysis. 2 edition. New York: Springer-Verlag
[4]
Dauxois,J., Pousse,A. ( 1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal., 12 : 136– 154
CrossRef Google scholar
[5]
KarhunenK.. ( 1946) Zur spektraltheorie stochastischer prozesse. Annales Academiae scien-tiarum Fennicae. Series A. 1, Mathematica-physica, page 34
[6]
LoeveM.. ( 1995) Probability Theory: Foundations, Random Sequences. Princeton: D. Van Nostrand, Company
[7]
Shang,H. ( 2014). A survey of functional principal component analysis. AStA Adv. Stat. Anal., 98 : 121– 142
CrossRef Google scholar
[8]
Jones,M. C. Rice,J. ( 1992). Displaying the important features of large collections of similar curves. Am. Stat., 46 : 140– 145
[9]
Yao,F., ller,H. Wang,J. ( 2005). Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc., 100 : 577– 590
CrossRef Google scholar
[10]
FanJ.. ( 1996) Local Polynomial Modelling and its Application. London: Chapman & Hall
[11]
Capra,W. B. ller,H. ( 1997). An accelerated-time model for response curves. J. Am. Stat. Assoc., 92 : 72– 83
CrossRef Google scholar
[12]
Shibata,R. ( 1981). An optimal selection of regression variables. Biometrika, 68 : 45– 54
CrossRef Google scholar
[13]
CarrollC., GajardoA., ChenY., DaiX., FanJ., HadjipantelisP. Z., HanK., JiH. Zhu C. LinS.. ( 2020) fdapace: Functional data analysis and empirical dynamics. R package version 0.5.3, https://CRAN.R-project.org/package=fdapace
[14]
Leurgans,S. E., Moyeed,R. A. Silverman,B. ( 1993). Canonical correlation analysis when the data are curves. J. R. Stat. Soc. B, 55 : 725– 740
CrossRef Google scholar
[15]
He,G., ller,H. Wang,J. ( 2003). Functional canonical analysis for square integrable stochastic processes. J. Multivariate Anal., 85 : 54– 77
CrossRef Google scholar
[16]
He,G., ller,H. Wang,J. ( 2004). Methods of canonical analysis for functional data. J. Stat. Plan. Inference, 122 : 141– 159
CrossRef Google scholar
[17]
Wang,J. Chiou,J. ller,H. ( 2016). Functional data analysis. Annu. Rev. Stat. Appl., 3 : 257– 295
CrossRef Google scholar
[18]
RamsayJ. O., GravesS.. fda: Functional data analysis, 2020. R package version 5.1.4, https://CRAN.R-project.org/package=fda
[19]
RamsayJ. O. SilvermanB.. ( 2002) Applied Functional Data Analysis: Methods and Case Studies. New York: Springer-Verlag
[20]
Yang,W., ller,H. ( 2011). Functional singular component analysis. J. R. Stat. Soc. Series B Stat. Methodol., 73 : 303– 324
CrossRef Google scholar
[21]
KaufmanL. RousseeuwP.. ( 1990) Finding Groups in Data: An Introduction to Cluster Analysis. 1 edition. Hoboken: Wiley-Interscience
[22]
Ferreira,L. Hitchcock,D. ( 2009). A comparison of hierarchical methods for clustering functional data. Commun. Stat. Simul. Comput., 38 : 1925– 1949
CrossRef Google scholar
[23]
Abraham,C., Cornillon,P. A., Matzner-Lber,E. ( 2003). Unsupervised curve clustering using B-splines. Scand. J. Stat., 30 : 581– 595
CrossRef Google scholar
[24]
Jacques,J. ( 2013). Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing, 112 : 161– 171
CrossRef Google scholar
[25]
Chiou,J. Li,P. ( 2007). Functional clustering and identifying substructures of longitudinal data. J. R. Stat. Soc. Series B Stat. Methodol., 69 : 679– 699
CrossRef Google scholar
[26]
Chiou,J. Li,P. ( 2008). Correlation-based functional clustering via subspace projection. J. Am. Stat. Assoc., 103 : 1684– 1692
CrossRef Google scholar
[27]
Peng,J. ller,H. ( 2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat., 2 : 1056– 1077
CrossRef Google scholar
[28]
ChenW.. ( 2015) EMCluster: EM algorithm for model-based clustering of finite mixture Gaussian distribution, 2015. R Package, http://cran.rproject.org/package=EMCluster
[29]
Dempster,A. P., Laird,N. M. Rubin,D. ( 1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B, 39 : 1– 22
CrossRef Google scholar
[30]
Lee,G. ( 2012). EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput. Stat. Data Anal., 56 : 2816– 2829
CrossRef Google scholar
[31]
Biernacki,C., Celeux,G. ( 2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal., 41 : 561– 575
CrossRef Google scholar
[32]
rmann,S. ( 2010). Weakly dependent functional data. Ann. Stat., 38 : 1845– 1884
[33]
rmannS.. ( 2012) Weakly dependent functional data. In: Handbook of Statistics (Rao, T. S., Rao, S. S. and Rao, C. eds.,) volume 30, pp. 157– 186. Amsterdam: Elsevier
[34]
Chiou,J. ller,H. ( 2009). Modeling hazard rates as functional data for the analysis of cohort lifetables and mortality forecasting. J. Am. Stat. Assoc., 104 : 572– 585
CrossRef Google scholar
[35]
McAloonC., HuntK., BarberA., ByrneA. W., ButlerF., CaseyM., GriffinJ., LaneE., McEvoyD.,. ( 2020) Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research. BMJ Open, 10, e039652
[36]
ZhangP., WangT. XieS.. ( 2021) Meta-analysis of several epidemic characteristics of COVID-19. J. Data Sci., 18, 536– 549
[37]
Gabrys,R., Horvath,L. ( 2010). Tests for error correlation in the functional linear model. J. Am. Stat. Assoc., 105 : 1113– 1125
CrossRef Google scholar
[38]
th,L., Kokoszka,P. ( 2014). Testing stationarity of functional time series. J. Econom., 179 : 66– 82
CrossRef Google scholar
[39]
Hyndman,R. J. ( 2007). Robust forecasting of mortality and fertility rates: A functional data approach. Comput. Stat. Data Anal., 51 : 4942– 4956
CrossRef Google scholar
[40]
Hyndman,R. J. Shang,H. ( 2010). Rainbow plots, bag plots, and boxplots for functional data. J. Comput. Graph. Stat., 19 : 29– 45
CrossRef Google scholar
[41]
rmann,S., ( 2015). Dynamic functional principal components. J. R. Stat. Soc. Series B Stat. Methodol., 77 : 319– 348
CrossRef Google scholar
[42]
Panaretos,V. M. ( 2013). Fourier analysis of stationary time series in function space. Ann. Stat., 41 : 568– 603
CrossRef Google scholar
[43]
Andrews,D. W. ( 1991). Heteroskedasticity and autocorrelation consistent covariant matrix estimation. Econometrica, 59 : 817– 858
CrossRef Google scholar
[44]
Politis,D. N. Romano,J. ( 1996). On at-top kernel spectral density estimators for homogeneous random fields. J. Stat. Plan. Inference, 51 : 41– 53
CrossRef Google scholar
[45]
Rice,G. Shang,H. ( 2017). A plug-in bandwidth selection procedure for long-run covariance estimation with stationary functional time series. J. Time Ser. Anal., 38 : 591– 609
CrossRef Google scholar
[46]
Hyndman,R. J. Shang,H. ( 2009). Forecasting functional time series. J. Korean Stat. Soc., 38 : 199– 211
CrossRef Google scholar
[47]
Aue,A., Norinho,D. D. ( 2015). On the prediction of stationary functional time series. J. Am. Stat. Assoc., 110 : 378– 392
CrossRef Google scholar
[48]
Shang,H. ( 2018). Bootstrap methods for stationary functional time series. Stat. Comput., 28 : 1– 10
CrossRef Google scholar
[49]
Gneiting,T. Raftery,A. ( 2007). Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102 : 359– 378
CrossRef Google scholar
[50]
AbdollahiE., ChampredonD., LangleyJ. M., GalvaniA. P. MoghadasS.. ( 2020) Temporal estimates of case-fatality rate for COVID-19 outbreaks in Canada and the United States. CMAJ, 192, E666– E670
[51]
OmerS. B., MalaniP.. ( 2020) The COVID-19 pandemic in the US: A clinical update. JAMA, 323, 1767– 1768
[52]
Peirlinck,M., Linka,K., Sahli Costabal,F. ( 2020). Outbreak dynamics of COVID-19 in China and the United States. Biomech. Model. Mechanobiol., 19 : 2179– 2193
CrossRef Google scholar
[53]
Ramsay,J. O. Dalzell,C. ( 1991). Some tools for functional data analysis. J. R. Stat. Soc. B, 53 : 539– 561
CrossRef Google scholar
[54]
Ramsay,J. ( 1982). When the data are functions. Psychometrika, 47 : 379– 396
CrossRef Google scholar
[55]
Ullah,S. Finch,C. ( 2013). Applications of functional data analysis: A systematic review. BMC Med. Res. Methodol., 13 : 43
CrossRef Google scholar
[56]
BoschiT., Di IorioJ., TestaL., CremonaM. A.. ( 2020) The shapes of an epidemic: Using functional data analysis to characterize COVID-19 in Italy. arXiv, 2008.04700v1
[57]
Carroll,C., Bhattacharjee,S., Chen,Y., Dubey,P., Fan,J., Zhou,X., ller,H. G. Wang,J. ( 2020). Time dynamics of COVID-19. Sci. Rep., 10 : 21040
CrossRef Google scholar
[58]
Siordia,J. A. ( 2020). Epidemiology and clinical features of COVID-19: A review of current literature. J. Clin. Virol., 127 : 104357
CrossRef Google scholar
[59]
JiangF., ZhaoZ.. ( 2020) Time series analysis of COVID-19 infection curve: A change-point perspective. J. Econom., doi: 10.1016/j.jeconom.2020.07.039

ACKNOWLEDGMENTS

The authors would like to thank two anonymous referees for their valuable comments on the manuscript.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Chen Tang, Tiandong Wang and Panpan Zhang declare that they have no conflict of interest or financial conflicts to disclose. All procedures performed in studies were in accordance with the ethical standards of the institution or practice at which the studies were conducted, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2022 The Author (s). Published by Higher Education Press.
AI Summary AI Mindmap
PDF(4648 KB)

Accesses

Citations

Detail

Sections
Recommended

/