Functional data analysis: An application to COVID-19 data in the United States in 2020
Chen Tang, Tiandong Wang, Panpan Zhang
Functional data analysis: An application to COVID-19 data in the United States in 2020
Background: In this paper, we conduct an analysis of the COVID-19 data in the United States in 2020 via functional data analysis methods. Through this research, we investigate the effectiveness of the practice of public health measures, and assess the correlation between infections and deaths caused by the COVID-19. Additionally, we look into the relationship between COVID-19 spread and geographical locations, and propose a forecasting method to predict the total number of confirmed cases nationwide.
Methods: The functional data analysis methods include functional principal analysis methods, functional canonical correlation analysis methods, an expectation-maximization (EM) based clustering algorithm and a functional time series model used for forecasting.
Results: It is evident that the practice of public health measures helps to reduce the growth rate of the epidemic outbreak over the nation. We have observed a high canonical correlation between confirmed and death cases. States that are geographically close to the hot spots are likely to be clustered together, and population density appears to be a critical factor affecting the cluster structure. The proposed functional time series model gives more reliable and accurate predictions of the total number of confirmed cases than standard time series methods.
Conclusions: The results obtained by applying the functional data analysis methods provide new insights into the COVID-19 data in the United States. With our results and recommendations, the health professionals can make better decisions to reduce the spread of the epidemic, and mitigate its negative effects to the national public health.
We study the COVID-19 time series data in the United States in 2020 via functional data analysis methods. We find that the practice of public health measures helps to reduce the spread of the epidemic, and there is a high canonical correlation between confirmed and death cases. The cluster structure at state level is closely related to the state geographical locations. Through a functional time series model, we are able to accurately predict the total number of infections in the nation relative to standard time series methods.
COVID-19 / canonical correlation / cluster analysis / functional time series / forecasting / principal component analysis
[1] |
Dong,E., Du,H. ( 2020). An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis., 20 : 533– 534
CrossRef
Google scholar
|
[2] |
Grubaugh,N. D., Hanage,W. P. Rasmussen,A. ( 2020). Making sense of mutation: What D614G means for the COVID-19 pandemic remains unclear. Cell, 182 : 794– 795
CrossRef
Google scholar
|
[3] |
JollieI.. ( 2002) Principal Component Analysis. 2 edition. New York: Springer-Verlag
|
[4] |
Dauxois,J., Pousse,A. ( 1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal., 12 : 136– 154
CrossRef
Google scholar
|
[5] |
KarhunenK.. ( 1946) Zur spektraltheorie stochastischer prozesse. Annales Academiae scien-tiarum Fennicae. Series A. 1, Mathematica-physica, page 34
|
[6] |
LoeveM.. ( 1995) Probability Theory: Foundations, Random Sequences. Princeton: D. Van Nostrand, Company
|
[7] |
Shang,H. ( 2014). A survey of functional principal component analysis. AStA Adv. Stat. Anal., 98 : 121– 142
CrossRef
Google scholar
|
[8] |
Jones,M. C. Rice,J. ( 1992). Displaying the important features of large collections of similar curves. Am. Stat., 46 : 140– 145
|
[9] |
Yao,F., ller,H. Wang,J. ( 2005). Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc., 100 : 577– 590
CrossRef
Google scholar
|
[10] |
FanJ.. ( 1996) Local Polynomial Modelling and its Application. London: Chapman & Hall
|
[11] |
Capra,W. B. ller,H. ( 1997). An accelerated-time model for response curves. J. Am. Stat. Assoc., 92 : 72– 83
CrossRef
Google scholar
|
[12] |
Shibata,R. ( 1981). An optimal selection of regression variables. Biometrika, 68 : 45– 54
CrossRef
Google scholar
|
[13] |
CarrollC., GajardoA., ChenY., DaiX., FanJ., HadjipantelisP. Z., HanK., JiH. Zhu C. LinS.. ( 2020) fdapace: Functional data analysis and empirical dynamics. R package version 0.5.3,
|
[14] |
Leurgans,S. E., Moyeed,R. A. Silverman,B. ( 1993). Canonical correlation analysis when the data are curves. J. R. Stat. Soc. B, 55 : 725– 740
CrossRef
Google scholar
|
[15] |
He,G., ller,H. Wang,J. ( 2003). Functional canonical analysis for square integrable stochastic processes. J. Multivariate Anal., 85 : 54– 77
CrossRef
Google scholar
|
[16] |
He,G., ller,H. Wang,J. ( 2004). Methods of canonical analysis for functional data. J. Stat. Plan. Inference, 122 : 141– 159
CrossRef
Google scholar
|
[17] |
Wang,J. Chiou,J. ller,H. ( 2016). Functional data analysis. Annu. Rev. Stat. Appl., 3 : 257– 295
CrossRef
Google scholar
|
[18] |
RamsayJ. O., GravesS.. fda: Functional data analysis, 2020. R package version 5.1.4,
|
[19] |
RamsayJ. O. SilvermanB.. ( 2002) Applied Functional Data Analysis: Methods and Case Studies. New York: Springer-Verlag
|
[20] |
Yang,W., ller,H. ( 2011). Functional singular component analysis. J. R. Stat. Soc. Series B Stat. Methodol., 73 : 303– 324
CrossRef
Google scholar
|
[21] |
KaufmanL. RousseeuwP.. ( 1990) Finding Groups in Data: An Introduction to Cluster Analysis. 1 edition. Hoboken: Wiley-Interscience
|
[22] |
Ferreira,L. Hitchcock,D. ( 2009). A comparison of hierarchical methods for clustering functional data. Commun. Stat. Simul. Comput., 38 : 1925– 1949
CrossRef
Google scholar
|
[23] |
Abraham,C., Cornillon,P. A., Matzner-Lber,E. ( 2003). Unsupervised curve clustering using B-splines. Scand. J. Stat., 30 : 581– 595
CrossRef
Google scholar
|
[24] |
Jacques,J. ( 2013). Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing, 112 : 161– 171
CrossRef
Google scholar
|
[25] |
Chiou,J. Li,P. ( 2007). Functional clustering and identifying substructures of longitudinal data. J. R. Stat. Soc. Series B Stat. Methodol., 69 : 679– 699
CrossRef
Google scholar
|
[26] |
Chiou,J. Li,P. ( 2008). Correlation-based functional clustering via subspace projection. J. Am. Stat. Assoc., 103 : 1684– 1692
CrossRef
Google scholar
|
[27] |
Peng,J. ller,H. ( 2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat., 2 : 1056– 1077
CrossRef
Google scholar
|
[28] |
ChenW.. ( 2015) EMCluster: EM algorithm for model-based clustering of finite mixture Gaussian distribution, 2015. R Package,
|
[29] |
Dempster,A. P., Laird,N. M. Rubin,D. ( 1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B, 39 : 1– 22
CrossRef
Google scholar
|
[30] |
Lee,G. ( 2012). EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput. Stat. Data Anal., 56 : 2816– 2829
CrossRef
Google scholar
|
[31] |
Biernacki,C., Celeux,G. ( 2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal., 41 : 561– 575
CrossRef
Google scholar
|
[32] |
rmann,S. ( 2010). Weakly dependent functional data. Ann. Stat., 38 : 1845– 1884
|
[33] |
rmannS.. ( 2012) Weakly dependent functional data. In: Handbook of Statistics (Rao, T. S., Rao, S. S. and Rao, C. eds.,) volume 30, pp. 157– 186. Amsterdam: Elsevier
|
[34] |
Chiou,J. ller,H. ( 2009). Modeling hazard rates as functional data for the analysis of cohort lifetables and mortality forecasting. J. Am. Stat. Assoc., 104 : 572– 585
CrossRef
Google scholar
|
[35] |
McAloonC., HuntK., BarberA., ByrneA. W., ButlerF., CaseyM., GriffinJ., LaneE., McEvoyD.,. ( 2020) Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research. BMJ Open, 10, e039652
|
[36] |
ZhangP., WangT. XieS.. ( 2021) Meta-analysis of several epidemic characteristics of COVID-19. J. Data Sci., 18, 536– 549
|
[37] |
Gabrys,R., Horvath,L. ( 2010). Tests for error correlation in the functional linear model. J. Am. Stat. Assoc., 105 : 1113– 1125
CrossRef
Google scholar
|
[38] |
th,L., Kokoszka,P. ( 2014). Testing stationarity of functional time series. J. Econom., 179 : 66– 82
CrossRef
Google scholar
|
[39] |
Hyndman,R. J. ( 2007). Robust forecasting of mortality and fertility rates: A functional data approach. Comput. Stat. Data Anal., 51 : 4942– 4956
CrossRef
Google scholar
|
[40] |
Hyndman,R. J. Shang,H. ( 2010). Rainbow plots, bag plots, and boxplots for functional data. J. Comput. Graph. Stat., 19 : 29– 45
CrossRef
Google scholar
|
[41] |
rmann,S., ( 2015). Dynamic functional principal components. J. R. Stat. Soc. Series B Stat. Methodol., 77 : 319– 348
CrossRef
Google scholar
|
[42] |
Panaretos,V. M. ( 2013). Fourier analysis of stationary time series in function space. Ann. Stat., 41 : 568– 603
CrossRef
Google scholar
|
[43] |
Andrews,D. W. ( 1991). Heteroskedasticity and autocorrelation consistent covariant matrix estimation. Econometrica, 59 : 817– 858
CrossRef
Google scholar
|
[44] |
Politis,D. N. Romano,J. ( 1996). On at-top kernel spectral density estimators for homogeneous random fields. J. Stat. Plan. Inference, 51 : 41– 53
CrossRef
Google scholar
|
[45] |
Rice,G. Shang,H. ( 2017). A plug-in bandwidth selection procedure for long-run covariance estimation with stationary functional time series. J. Time Ser. Anal., 38 : 591– 609
CrossRef
Google scholar
|
[46] |
Hyndman,R. J. Shang,H. ( 2009). Forecasting functional time series. J. Korean Stat. Soc., 38 : 199– 211
CrossRef
Google scholar
|
[47] |
Aue,A., Norinho,D. D. ( 2015). On the prediction of stationary functional time series. J. Am. Stat. Assoc., 110 : 378– 392
CrossRef
Google scholar
|
[48] |
Shang,H. ( 2018). Bootstrap methods for stationary functional time series. Stat. Comput., 28 : 1– 10
CrossRef
Google scholar
|
[49] |
Gneiting,T. Raftery,A. ( 2007). Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102 : 359– 378
CrossRef
Google scholar
|
[50] |
AbdollahiE., ChampredonD., LangleyJ. M., GalvaniA. P. MoghadasS.. ( 2020) Temporal estimates of case-fatality rate for COVID-19 outbreaks in Canada and the United States. CMAJ, 192, E666– E670
|
[51] |
OmerS. B., MalaniP.. ( 2020) The COVID-19 pandemic in the US: A clinical update. JAMA, 323, 1767– 1768
|
[52] |
Peirlinck,M., Linka,K., Sahli Costabal,F. ( 2020). Outbreak dynamics of COVID-19 in China and the United States. Biomech. Model. Mechanobiol., 19 : 2179– 2193
CrossRef
Google scholar
|
[53] |
Ramsay,J. O. Dalzell,C. ( 1991). Some tools for functional data analysis. J. R. Stat. Soc. B, 53 : 539– 561
CrossRef
Google scholar
|
[54] |
Ramsay,J. ( 1982). When the data are functions. Psychometrika, 47 : 379– 396
CrossRef
Google scholar
|
[55] |
Ullah,S. Finch,C. ( 2013). Applications of functional data analysis: A systematic review. BMC Med. Res. Methodol., 13 : 43
CrossRef
Google scholar
|
[56] |
BoschiT., Di IorioJ., TestaL., CremonaM. A.. ( 2020) The shapes of an epidemic: Using functional data analysis to characterize COVID-19 in Italy. arXiv, 2008.04700v1
|
[57] |
Carroll,C., Bhattacharjee,S., Chen,Y., Dubey,P., Fan,J., Zhou,X., ller,H. G. Wang,J. ( 2020). Time dynamics of COVID-19. Sci. Rep., 10 : 21040
CrossRef
Google scholar
|
[58] |
Siordia,J. A. ( 2020). Epidemiology and clinical features of COVID-19: A review of current literature. J. Clin. Virol., 127 : 104357
CrossRef
Google scholar
|
[59] |
JiangF., ZhaoZ.. ( 2020) Time series analysis of COVID-19 infection curve: A change-point perspective. J. Econom., doi:
|
/
〈 | 〉 |