Functional data analysis: An application to COVID-19 data in the United States in 2020

Chen Tang, Tiandong Wang, Panpan Zhang

PDF(4648 KB)
PDF(4648 KB)
Quant. Biol. ›› 2022, Vol. 10 ›› Issue (2) : 172-187. DOI: 10.15302/J-QB-022-0300

Functional data analysis: An application to COVID-19 data in the United States in 2020

Author information +
History +


Background: In this paper, we conduct an analysis of the COVID-19 data in the United States in 2020 via functional data analysis methods. Through this research, we investigate the effectiveness of the practice of public health measures, and assess the correlation between infections and deaths caused by the COVID-19. Additionally, we look into the relationship between COVID-19 spread and geographical locations, and propose a forecasting method to predict the total number of confirmed cases nationwide.

Methods: The functional data analysis methods include functional principal analysis methods, functional canonical correlation analysis methods, an expectation-maximization (EM) based clustering algorithm and a functional time series model used for forecasting.

Results: It is evident that the practice of public health measures helps to reduce the growth rate of the epidemic outbreak over the nation. We have observed a high canonical correlation between confirmed and death cases. States that are geographically close to the hot spots are likely to be clustered together, and population density appears to be a critical factor affecting the cluster structure. The proposed functional time series model gives more reliable and accurate predictions of the total number of confirmed cases than standard time series methods.

Conclusions: The results obtained by applying the functional data analysis methods provide new insights into the COVID-19 data in the United States. With our results and recommendations, the health professionals can make better decisions to reduce the spread of the epidemic, and mitigate its negative effects to the national public health.

Author summary

We study the COVID-19 time series data in the United States in 2020 via functional data analysis methods. We find that the practice of public health measures helps to reduce the spread of the epidemic, and there is a high canonical correlation between confirmed and death cases. The cluster structure at state level is closely related to the state geographical locations. Through a functional time series model, we are able to accurately predict the total number of infections in the nation relative to standard time series methods.

Graphical abstract


COVID-19 / canonical correlation / cluster analysis / functional time series / forecasting / principal component analysis

Cite this article

Download citation ▾
Chen Tang, Tiandong Wang, Panpan Zhang. Functional data analysis: An application to COVID-19 data in the United States in 2020. Quant. Biol., 2022, 10(2): 172‒187


Dong,E., Du,H. ( 2020). An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis., 20 : 533– 534
CrossRef Google scholar
Grubaugh,N. D., Hanage,W. P. Rasmussen,A. ( 2020). Making sense of mutation: What D614G means for the COVID-19 pandemic remains unclear. Cell, 182 : 794– 795
CrossRef Google scholar
JollieI.. ( 2002) Principal Component Analysis. 2 edition. New York: Springer-Verlag
Dauxois,J., Pousse,A. ( 1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal., 12 : 136– 154
CrossRef Google scholar
KarhunenK.. ( 1946) Zur spektraltheorie stochastischer prozesse. Annales Academiae scien-tiarum Fennicae. Series A. 1, Mathematica-physica, page 34
LoeveM.. ( 1995) Probability Theory: Foundations, Random Sequences. Princeton: D. Van Nostrand, Company
Shang,H. ( 2014). A survey of functional principal component analysis. AStA Adv. Stat. Anal., 98 : 121– 142
CrossRef Google scholar
Jones,M. C. Rice,J. ( 1992). Displaying the important features of large collections of similar curves. Am. Stat., 46 : 140– 145
Yao,F., ller,H. Wang,J. ( 2005). Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc., 100 : 577– 590
CrossRef Google scholar
FanJ.. ( 1996) Local Polynomial Modelling and its Application. London: Chapman & Hall
Capra,W. B. ller,H. ( 1997). An accelerated-time model for response curves. J. Am. Stat. Assoc., 92 : 72– 83
CrossRef Google scholar
Shibata,R. ( 1981). An optimal selection of regression variables. Biometrika, 68 : 45– 54
CrossRef Google scholar
CarrollC., GajardoA., ChenY., DaiX., FanJ., HadjipantelisP. Z., HanK., JiH. Zhu C. LinS.. ( 2020) fdapace: Functional data analysis and empirical dynamics. R package version 0.5.3,
Leurgans,S. E., Moyeed,R. A. Silverman,B. ( 1993). Canonical correlation analysis when the data are curves. J. R. Stat. Soc. B, 55 : 725– 740
CrossRef Google scholar
He,G., ller,H. Wang,J. ( 2003). Functional canonical analysis for square integrable stochastic processes. J. Multivariate Anal., 85 : 54– 77
CrossRef Google scholar
He,G., ller,H. Wang,J. ( 2004). Methods of canonical analysis for functional data. J. Stat. Plan. Inference, 122 : 141– 159
CrossRef Google scholar
Wang,J. Chiou,J. ller,H. ( 2016). Functional data analysis. Annu. Rev. Stat. Appl., 3 : 257– 295
CrossRef Google scholar
RamsayJ. O., GravesS.. fda: Functional data analysis, 2020. R package version 5.1.4,
RamsayJ. O. SilvermanB.. ( 2002) Applied Functional Data Analysis: Methods and Case Studies. New York: Springer-Verlag
Yang,W., ller,H. ( 2011). Functional singular component analysis. J. R. Stat. Soc. Series B Stat. Methodol., 73 : 303– 324
CrossRef Google scholar
KaufmanL. RousseeuwP.. ( 1990) Finding Groups in Data: An Introduction to Cluster Analysis. 1 edition. Hoboken: Wiley-Interscience
Ferreira,L. Hitchcock,D. ( 2009). A comparison of hierarchical methods for clustering functional data. Commun. Stat. Simul. Comput., 38 : 1925– 1949
CrossRef Google scholar
Abraham,C., Cornillon,P. A., Matzner-Lber,E. ( 2003). Unsupervised curve clustering using B-splines. Scand. J. Stat., 30 : 581– 595
CrossRef Google scholar
Jacques,J. ( 2013). Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing, 112 : 161– 171
CrossRef Google scholar
Chiou,J. Li,P. ( 2007). Functional clustering and identifying substructures of longitudinal data. J. R. Stat. Soc. Series B Stat. Methodol., 69 : 679– 699
CrossRef Google scholar
Chiou,J. Li,P. ( 2008). Correlation-based functional clustering via subspace projection. J. Am. Stat. Assoc., 103 : 1684– 1692
CrossRef Google scholar
Peng,J. ller,H. ( 2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat., 2 : 1056– 1077
CrossRef Google scholar
ChenW.. ( 2015) EMCluster: EM algorithm for model-based clustering of finite mixture Gaussian distribution, 2015. R Package,
Dempster,A. P., Laird,N. M. Rubin,D. ( 1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B, 39 : 1– 22
CrossRef Google scholar
Lee,G. ( 2012). EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput. Stat. Data Anal., 56 : 2816– 2829
CrossRef Google scholar
Biernacki,C., Celeux,G. ( 2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal., 41 : 561– 575
CrossRef Google scholar
rmann,S. ( 2010). Weakly dependent functional data. Ann. Stat., 38 : 1845– 1884
rmannS.. ( 2012) Weakly dependent functional data. In: Handbook of Statistics (Rao, T. S., Rao, S. S. and Rao, C. eds.,) volume 30, pp. 157– 186. Amsterdam: Elsevier
Chiou,J. ller,H. ( 2009). Modeling hazard rates as functional data for the analysis of cohort lifetables and mortality forecasting. J. Am. Stat. Assoc., 104 : 572– 585
CrossRef Google scholar
McAloonC., HuntK., BarberA., ByrneA. W., ButlerF., CaseyM., GriffinJ., LaneE., McEvoyD.,. ( 2020) Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research. BMJ Open, 10, e039652
ZhangP., WangT. XieS.. ( 2021) Meta-analysis of several epidemic characteristics of COVID-19. J. Data Sci., 18, 536– 549
Gabrys,R., Horvath,L. ( 2010). Tests for error correlation in the functional linear model. J. Am. Stat. Assoc., 105 : 1113– 1125
CrossRef Google scholar
th,L., Kokoszka,P. ( 2014). Testing stationarity of functional time series. J. Econom., 179 : 66– 82
CrossRef Google scholar
Hyndman,R. J. ( 2007). Robust forecasting of mortality and fertility rates: A functional data approach. Comput. Stat. Data Anal., 51 : 4942– 4956
CrossRef Google scholar
Hyndman,R. J. Shang,H. ( 2010). Rainbow plots, bag plots, and boxplots for functional data. J. Comput. Graph. Stat., 19 : 29– 45
CrossRef Google scholar
rmann,S., ( 2015). Dynamic functional principal components. J. R. Stat. Soc. Series B Stat. Methodol., 77 : 319– 348
CrossRef Google scholar
Panaretos,V. M. ( 2013). Fourier analysis of stationary time series in function space. Ann. Stat., 41 : 568– 603
CrossRef Google scholar
Andrews,D. W. ( 1991). Heteroskedasticity and autocorrelation consistent covariant matrix estimation. Econometrica, 59 : 817– 858
CrossRef Google scholar
Politis,D. N. Romano,J. ( 1996). On at-top kernel spectral density estimators for homogeneous random fields. J. Stat. Plan. Inference, 51 : 41– 53
CrossRef Google scholar
Rice,G. Shang,H. ( 2017). A plug-in bandwidth selection procedure for long-run covariance estimation with stationary functional time series. J. Time Ser. Anal., 38 : 591– 609
CrossRef Google scholar
Hyndman,R. J. Shang,H. ( 2009). Forecasting functional time series. J. Korean Stat. Soc., 38 : 199– 211
CrossRef Google scholar
Aue,A., Norinho,D. D. ( 2015). On the prediction of stationary functional time series. J. Am. Stat. Assoc., 110 : 378– 392
CrossRef Google scholar
Shang,H. ( 2018). Bootstrap methods for stationary functional time series. Stat. Comput., 28 : 1– 10
CrossRef Google scholar
Gneiting,T. Raftery,A. ( 2007). Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102 : 359– 378
CrossRef Google scholar
AbdollahiE., ChampredonD., LangleyJ. M., GalvaniA. P. MoghadasS.. ( 2020) Temporal estimates of case-fatality rate for COVID-19 outbreaks in Canada and the United States. CMAJ, 192, E666– E670
OmerS. B., MalaniP.. ( 2020) The COVID-19 pandemic in the US: A clinical update. JAMA, 323, 1767– 1768
Peirlinck,M., Linka,K., Sahli Costabal,F. ( 2020). Outbreak dynamics of COVID-19 in China and the United States. Biomech. Model. Mechanobiol., 19 : 2179– 2193
CrossRef Google scholar
Ramsay,J. O. Dalzell,C. ( 1991). Some tools for functional data analysis. J. R. Stat. Soc. B, 53 : 539– 561
CrossRef Google scholar
Ramsay,J. ( 1982). When the data are functions. Psychometrika, 47 : 379– 396
CrossRef Google scholar
Ullah,S. Finch,C. ( 2013). Applications of functional data analysis: A systematic review. BMC Med. Res. Methodol., 13 : 43
CrossRef Google scholar
BoschiT., Di IorioJ., TestaL., CremonaM. A.. ( 2020) The shapes of an epidemic: Using functional data analysis to characterize COVID-19 in Italy. arXiv, 2008.04700v1
Carroll,C., Bhattacharjee,S., Chen,Y., Dubey,P., Fan,J., Zhou,X., ller,H. G. Wang,J. ( 2020). Time dynamics of COVID-19. Sci. Rep., 10 : 21040
CrossRef Google scholar
Siordia,J. A. ( 2020). Epidemiology and clinical features of COVID-19: A review of current literature. J. Clin. Virol., 127 : 104357
CrossRef Google scholar
JiangF., ZhaoZ.. ( 2020) Time series analysis of COVID-19 infection curve: A change-point perspective. J. Econom., doi: 10.1016/j.jeconom.2020.07.039


The authors would like to thank two anonymous referees for their valuable comments on the manuscript.


The authors Chen Tang, Tiandong Wang and Panpan Zhang declare that they have no conflict of interest or financial conflicts to disclose. All procedures performed in studies were in accordance with the ethical standards of the institution or practice at which the studies were conducted, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.


This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit


2022 The Author (s). Published by Higher Education Press.
AI Summary AI Mindmap
PDF(4648 KB)




