Functional data analysis: An application to COVID-19 data in the United States in 2020

Chen Tang , Tiandong Wang , Panpan Zhang

Quant. Biol. ›› 2022, Vol. 10 ›› Issue (2) : 172 -187.

PDF (4648KB)
Quant. Biol. ›› 2022, Vol. 10 ›› Issue (2) : 172 -187. DOI: 10.15302/J-QB-022-0300
RESEARCH ARTICLE
RESEARCH ARTICLE

Functional data analysis: An application to COVID-19 data in the United States in 2020

Author information +
History +
PDF (4648KB)

Abstract

Background: In this paper, we conduct an analysis of the COVID-19 data in the United States in 2020 via functional data analysis methods. Through this research, we investigate the effectiveness of the practice of public health measures, and assess the correlation between infections and deaths caused by the COVID-19. Additionally, we look into the relationship between COVID-19 spread and geographical locations, and propose a forecasting method to predict the total number of confirmed cases nationwide.

Methods: The functional data analysis methods include functional principal analysis methods, functional canonical correlation analysis methods, an expectation-maximization (EM) based clustering algorithm and a functional time series model used for forecasting.

Results: It is evident that the practice of public health measures helps to reduce the growth rate of the epidemic outbreak over the nation. We have observed a high canonical correlation between confirmed and death cases. States that are geographically close to the hot spots are likely to be clustered together, and population density appears to be a critical factor affecting the cluster structure. The proposed functional time series model gives more reliable and accurate predictions of the total number of confirmed cases than standard time series methods.

Conclusions: The results obtained by applying the functional data analysis methods provide new insights into the COVID-19 data in the United States. With our results and recommendations, the health professionals can make better decisions to reduce the spread of the epidemic, and mitigate its negative effects to the national public health.

Graphical abstract

Keywords

COVID-19 / canonical correlation / cluster analysis / functional time series / forecasting / principal component analysis

Cite this article

Download citation ▾
Chen Tang, Tiandong Wang, Panpan Zhang. Functional data analysis: An application to COVID-19 data in the United States in 2020. Quant. Biol., 2022, 10(2): 172-187 DOI:10.15302/J-QB-022-0300

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Dong,E., Du,H. ( 2020). An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis., 20 : 533– 534

[2]

Grubaugh,N. D., Hanage,W. P. Rasmussen,A. ( 2020). Making sense of mutation: What D614G means for the COVID-19 pandemic remains unclear. Cell, 182 : 794– 795

[3]

JollieI.. ( 2002) Principal Component Analysis. 2 edition. New York: Springer-Verlag

[4]

Dauxois,J., Pousse,A. ( 1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal., 12 : 136– 154

[5]

KarhunenK.. ( 1946) Zur spektraltheorie stochastischer prozesse. Annales Academiae scien-tiarum Fennicae. Series A. 1, Mathematica-physica, page 34

[6]

LoeveM.. ( 1995) Probability Theory: Foundations, Random Sequences. Princeton: D. Van Nostrand, Company

[7]

Shang,H. ( 2014). A survey of functional principal component analysis. AStA Adv. Stat. Anal., 98 : 121– 142

[8]

Jones,M. C. Rice,J. ( 1992). Displaying the important features of large collections of similar curves. Am. Stat., 46 : 140– 145

[9]

Yao,F., ller,H. Wang,J. ( 2005). Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc., 100 : 577– 590

[10]

FanJ.. ( 1996) Local Polynomial Modelling and its Application. London: Chapman & Hall

[11]

Capra,W. B. ller,H. ( 1997). An accelerated-time model for response curves. J. Am. Stat. Assoc., 92 : 72– 83

[12]

Shibata,R. ( 1981). An optimal selection of regression variables. Biometrika, 68 : 45– 54

[13]

CarrollC., GajardoA., ChenY., DaiX., FanJ., HadjipantelisP. Z., HanK., JiH. Zhu C. LinS.. ( 2020) fdapace: Functional data analysis and empirical dynamics. R package version 0.5.3,

[14]

Leurgans,S. E., Moyeed,R. A. Silverman,B. ( 1993). Canonical correlation analysis when the data are curves. J. R. Stat. Soc. B, 55 : 725– 740

[15]

He,G., ller,H. Wang,J. ( 2003). Functional canonical analysis for square integrable stochastic processes. J. Multivariate Anal., 85 : 54– 77

[16]

He,G., ller,H. Wang,J. ( 2004). Methods of canonical analysis for functional data. J. Stat. Plan. Inference, 122 : 141– 159

[17]

Wang,J. Chiou,J. ller,H. ( 2016). Functional data analysis. Annu. Rev. Stat. Appl., 3 : 257– 295

[18]

RamsayJ. O., GravesS.. fda: Functional data analysis, 2020. R package version 5.1.4,

[19]

RamsayJ. O. SilvermanB.. ( 2002) Applied Functional Data Analysis: Methods and Case Studies. New York: Springer-Verlag

[20]

Yang,W., ller,H. ( 2011). Functional singular component analysis. J. R. Stat. Soc. Series B Stat. Methodol., 73 : 303– 324

[21]

KaufmanL. RousseeuwP.. ( 1990) Finding Groups in Data: An Introduction to Cluster Analysis. 1 edition. Hoboken: Wiley-Interscience

[22]

Ferreira,L. Hitchcock,D. ( 2009). A comparison of hierarchical methods for clustering functional data. Commun. Stat. Simul. Comput., 38 : 1925– 1949

[23]

Abraham,C., Cornillon,P. A., Matzner-Lber,E. ( 2003). Unsupervised curve clustering using B-splines. Scand. J. Stat., 30 : 581– 595

[24]

Jacques,J. ( 2013). Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing, 112 : 161– 171

[25]

Chiou,J. Li,P. ( 2007). Functional clustering and identifying substructures of longitudinal data. J. R. Stat. Soc. Series B Stat. Methodol., 69 : 679– 699

[26]

Chiou,J. Li,P. ( 2008). Correlation-based functional clustering via subspace projection. J. Am. Stat. Assoc., 103 : 1684– 1692

[27]

Peng,J. ller,H. ( 2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat., 2 : 1056– 1077

[28]

ChenW.. ( 2015) EMCluster: EM algorithm for model-based clustering of finite mixture Gaussian distribution, 2015. R Package,

[29]

Dempster,A. P., Laird,N. M. Rubin,D. ( 1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B, 39 : 1– 22

[30]

Lee,G. ( 2012). EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput. Stat. Data Anal., 56 : 2816– 2829

[31]

Biernacki,C., Celeux,G. ( 2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal., 41 : 561– 575

[32]

rmann,S. ( 2010). Weakly dependent functional data. Ann. Stat., 38 : 1845– 1884

[33]

rmannS.. ( 2012) Weakly dependent functional data. In: Handbook of Statistics (Rao, T. S., Rao, S. S. and Rao, C. eds.,) volume 30, pp. 157– 186. Amsterdam: Elsevier

[34]

Chiou,J. ller,H. ( 2009). Modeling hazard rates as functional data for the analysis of cohort lifetables and mortality forecasting. J. Am. Stat. Assoc., 104 : 572– 585

[35]

McAloonC., HuntK., BarberA., ByrneA. W., ButlerF., CaseyM., GriffinJ., LaneE., McEvoyD.,. ( 2020) Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research. BMJ Open, 10, e039652

[36]

ZhangP., WangT. XieS.. ( 2021) Meta-analysis of several epidemic characteristics of COVID-19. J. Data Sci., 18, 536– 549

[37]

Gabrys,R., Horvath,L. ( 2010). Tests for error correlation in the functional linear model. J. Am. Stat. Assoc., 105 : 1113– 1125

[38]

th,L., Kokoszka,P. ( 2014). Testing stationarity of functional time series. J. Econom., 179 : 66– 82

[39]

Hyndman,R. J. ( 2007). Robust forecasting of mortality and fertility rates: A functional data approach. Comput. Stat. Data Anal., 51 : 4942– 4956

[40]

Hyndman,R. J. Shang,H. ( 2010). Rainbow plots, bag plots, and boxplots for functional data. J. Comput. Graph. Stat., 19 : 29– 45

[41]

rmann,S., ( 2015). Dynamic functional principal components. J. R. Stat. Soc. Series B Stat. Methodol., 77 : 319– 348

[42]

Panaretos,V. M. ( 2013). Fourier analysis of stationary time series in function space. Ann. Stat., 41 : 568– 603

[43]

Andrews,D. W. ( 1991). Heteroskedasticity and autocorrelation consistent covariant matrix estimation. Econometrica, 59 : 817– 858

[44]

Politis,D. N. Romano,J. ( 1996). On at-top kernel spectral density estimators for homogeneous random fields. J. Stat. Plan. Inference, 51 : 41– 53

[45]

Rice,G. Shang,H. ( 2017). A plug-in bandwidth selection procedure for long-run covariance estimation with stationary functional time series. J. Time Ser. Anal., 38 : 591– 609

[46]

Hyndman,R. J. Shang,H. ( 2009). Forecasting functional time series. J. Korean Stat. Soc., 38 : 199– 211

[47]

Aue,A., Norinho,D. D. ( 2015). On the prediction of stationary functional time series. J. Am. Stat. Assoc., 110 : 378– 392

[48]

Shang,H. ( 2018). Bootstrap methods for stationary functional time series. Stat. Comput., 28 : 1– 10

[49]

Gneiting,T. Raftery,A. ( 2007). Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102 : 359– 378

[50]

AbdollahiE., ChampredonD., LangleyJ. M., GalvaniA. P. MoghadasS.. ( 2020) Temporal estimates of case-fatality rate for COVID-19 outbreaks in Canada and the United States. CMAJ, 192, E666– E670

[51]

OmerS. B., MalaniP.. ( 2020) The COVID-19 pandemic in the US: A clinical update. JAMA, 323, 1767– 1768

[52]

Peirlinck,M., Linka,K., Sahli Costabal,F. ( 2020). Outbreak dynamics of COVID-19 in China and the United States. Biomech. Model. Mechanobiol., 19 : 2179– 2193

[53]

Ramsay,J. O. Dalzell,C. ( 1991). Some tools for functional data analysis. J. R. Stat. Soc. B, 53 : 539– 561

[54]

Ramsay,J. ( 1982). When the data are functions. Psychometrika, 47 : 379– 396

[55]

Ullah,S. Finch,C. ( 2013). Applications of functional data analysis: A systematic review. BMC Med. Res. Methodol., 13 : 43

[56]

BoschiT., Di IorioJ., TestaL., CremonaM. A.. ( 2020) The shapes of an epidemic: Using functional data analysis to characterize COVID-19 in Italy. arXiv, 2008.04700v1

[57]

Carroll,C., Bhattacharjee,S., Chen,Y., Dubey,P., Fan,J., Zhou,X., ller,H. G. Wang,J. ( 2020). Time dynamics of COVID-19. Sci. Rep., 10 : 21040

[58]

Siordia,J. A. ( 2020). Epidemiology and clinical features of COVID-19: A review of current literature. J. Clin. Virol., 127 : 104357

[59]

JiangF., ZhaoZ.. ( 2020) Time series analysis of COVID-19 infection curve: A change-point perspective. J. Econom.,

RIGHTS & PERMISSIONS

The Author (s). Published by Higher Education Press.

AI Summary AI Mindmap
PDF (4648KB)

1846

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/