INTRODUCTION
The coronavirus disease 2019 (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This virus is clustered together with the SARS-CoVs in bat and regarded as a SARS-like virus [
1]. The disease so far claims at least 3000 lives in China. Japan, Thailand, and U.S. have already reported the COVID-19 infection cases [
2]. Mysteriously, the SARS-CoV-2 antibodies were detected among the individuals enrolled in a prospective lung cancer screening trail in Italy as early as September 2019 [3], suggesting a possible multiple geographical onset of COVID-19 spread even before its identification in China [4]. In order to contain its devastating power, Chinese government has mobilized huge man power and poured vast resource to combat this virus. Unfortunately, the COVID-19’s shadow has already overcast more than 100 countries. The development of this epidemic may not be on everyone’s tongue, but is always on his mind. Therefore, it is almost compelling to develop a method to predict where this epidemic is going in real time.
Here we present a mathematic strategy to forecast the development of the COVID-19 epidemic in China by nowcasting. First, we provided the COVID-19 epidemic nowcasts for the next day based on a susceptible-infected-recovered (SIR) model. By doing so, we concluded that the COVID-19 transmission rate in China started to slow down after Jan 30. Second, after the infection nowcasts were no longer accurate, we used linear regression to forecast the transmission inflexion point of the COVID-19 epidemic, which shows that the inflexion point of this epidemic would arrive between Feb 17 and 18. Finally, we used the susceptible-exposed-infected-recovered (SEIR) model to simulate the possible development of this epidemic in China throughout 2020 in accordance with our predictions. By now, the COVID-19 epidemic has exacerbated into a global pandemic. We also applied our method to USA’s and global COVID-19 data and the results propose that there is no any sign of imminent amelioration for this pandemic.
RESULTS AND DISCUSSION
At the very beginning, we did not know that SARS-CoV-2 has an incubation period. Thus, we started with a susceptible-infected-recovered (SIR) model with 20,000,000 possible susceptibles (twice the population size of Wuhan city where the disease started) and 27 initial infected cases (news reported that there were 27 confirmed infected cases on Dec 31, 2019) [
5]. The susceptible population was set as large as possible, because we did not how many people this disease was going to affect on Jan 24, 2020.
Using the data from World Health Organization’s COVID-19 situation reports and the SIR model, we nowcasted the numbers of infections and recoveries on the next day from Jan 23, 2020. We did not use the SEIR model for nowcasting, because the number of exposed cases was unknown and would not be reported. Through changing the transmission rate (b) and recovery rate (g) in our SIR model, we first calibrated the numbers in our model to the reported data on the previous day as close as we could. For example, the reported infected and recovered cases were 830 and 34 on Jan 23, respectively. By changing b and g, the infection and recovery numbers on Jan 23 were set to be 836.96 and 37.44 in our SIR model. The model predicted that the infection and recovery numbers were 1020.23 and 43.76 on Jan 24, while the reported infected and recovered cases were 1287 and 38 on the same day (Tables 1 and 2). We only provided a nowcast for the next day, because the transmission of the COVID-19 epidemic in China was strongly influenced by the government’s control, e.g., the lockdown of Wuhan city and the restricted traffics in the other provinces, which make its development a stochastic process and long-term prediction inaccurate.
Comparing our nowcasts to the reported data, we can see the general trend of the COVID-19 epidemic’s development in China (Figs.1A and B). Figure 1A shows that before Jan 30, 2020, our model infection predictions are always lower than the reported infection numbers, but higher than the reported ones after it, except on Feb12, when the clinically diagnosed cases were started to be added to the total number of infected cases. Thus, we concluded that the COVID-19 transmission rate started to slow down after Jan 30. Figure 1B shows that our model recovery predictions are always higher than the reported recovery numbers between Jan 29 and Feb 10. Together, these results propose that the things are starting to get better from the last day of January, 2020.
After Feb 1, our infection nowcasts have become more and more inaccurate. When the increased infection cases reached the highest number of 14,108 on Feb 12 (including 1820 lab-confirmed cases and 12,288 clinically diagnosed cases) and began to sharply decline after it, we used the different mathematic tool to forecast the inflexion point of the COVID-19 transmission in China. Two linear regression formulas were fitted with the daily increased infection and recovery numbers and the two fitted lines are crossed between Feb 17 and 18 (Fig. 2A). Therefore, we infer that the inflexion point of the COVID-19 epidemic in China would appear around Feb18.
If the inflexion point did appear on Feb 18, we estimated that this disease would infect at least 80,000 people, almost 1% of Wuhan’s population. Then, we used the SEIR model to simulate what would happen if the inflexion point appeared on Feb 18 in accordance with our predictions above. After setting the susceptible population to 82,000, the transmission rate (
b) to 0.33, the recovery rate (
g) to 0.016, and the incubation rate (
s) to 0.3, the simulation model forecasted that the infected cases would be 60,445 and the recovered cases would be 14,545 on Feb 18, 2020 (Fig. 2B). By this forecast, we would have COVID-19 as a company by the end of 2020, if the recovery rate were not improved (Fig. 2B). The actual reported infected cases and recovered cases were 57,805 and 14376 on Feb 18, 2020. After Feb 18, the remaining COVID-19 patients in China declined day by day [
6].
The mean incubation period for COVID-19 is around 3 days and the basic reproduction number (
R0) for COVID-19 is estimated to be 3.28 [
7,
8]. In our SEIR simulation model, the incubation rate (
s) is 0.3, which is quite close to the number above since the incubation period equals to 1/
s. However, the
R0 in our model is 20.625, much larger than the other estimations (
R0 =
b/
g). We set the transmission rate (
b) and the recovery rate (
g) according to when the inflexion point would appear and how many infected and recovered cases would be predicted on Feb 18. Since our SEIR simulation is very close to real conditions, the extra-large
R0 is unlikely to be an artefact. The modeling of controlling COVID-19 outbreak in China shows that
R0 is not a stationary and fixed value [
9]. According to three epidemic dynamic stages, Zhao and Chen estimate that
R0 ranges from 4.7 to 0.47 in Wuhan city [
9]. The small
R0 at the third stage of COVID-19 epidemic in Wuhan demonstrates that strict quarantine measure is very effective. The extra-large
R0 in our SEIR simulation suggests that COVID-19 quickly exhausted the potential susceptible population and SARS-CoV-2 spread into every possible susceptible within a very short period of time in China. It actually proves the effectiveness of China’s strict quarantine measure from another perspective.
On March 23, the number of new infected cases became zero and the number of remaining infected cases was 4287, which is much lower than our SEIR simulation’s estimation (Fig. 2B). Moreover, the number of actual recovered cases was also higher than our SEIR simulation’s estimation. The actual numbers of infected and recovered cases proposes that Chinese government does not only implement effective infection control and prevention measures but also actively engage in providing medical care for infected patients. The former decreases the COVID-19 transmission rate and the latter increases the recovery rate for the infected patient. That is why our results show that the scale of China’s COVID-19 epidemic is much smaller than the previous estimations and China could contain the COVID-19 epidemic within two months. The other study also confirmed the effectiveness of Chinese government’s COVID-19 response, especially its infection control and prevention measures [
10].
By the end of March, the COVID-19 epidemic has developed into a global pandemic. The worst COVID-19 scenario is currently on the world’s stage. We applied our method to USA’s and global COVID-19 data between July 16 and August 2 to forecast the general trend of the COVID-19 pandemic in the rest of 2020. USA has the world’s largest number of COVID-19 cases by now. Figure 3 shows our nowcasts of infected and recovered cases and the reported infected and recovered ones in USA and world. The reported infected cases are not significantly lower than the predicted ones in USA and world (Fig. 3A and C). In some days of July (e.g., July 22), the reported infected cases were even higher than the predicted ones. Such trend can be also seen in the reported and predicted recovered cases (Fig 3B and D). Unlike what we observed in Fig. 1A, there is no clear division between the reported infected cases and the predicted infected ones in USA and world. This result proposes that there is no sign of imminent amelioration for this pandemic so far. Unfortunately, the COVID-19 pandemic is very like to last to next year.
DATA AND METHODS
The daily COVID-19 data of China were obtained from World Health Organization’s situation reports (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/). The daily USA’s and global COVID-19 data were retrieved from WHO coronavirus disease (COVID-19) dashboard (https://covid19.who.int/). The SIR and SEIR model construction and linear regressions were implemented in R package (version 3.6.2).
The differential equations for susceptible-infected-recovered (SIR) model in this study are as follows:
where
b is the transmission rate,
g is the recovery rate,
S,
I, and
R represent the number of susceptible, infected, and recovered individuals, respectively;
N =
S +
I +
R.
The differential equations for SEIR model in this study are as follows:
where
b is the transmission rate,
g is the recovery rate,
s is the rate at which an exposed person becomes infective,
S,
I,
E, and
R represent the number of susceptible, infected, exposed, and recovered individuals, respectively;
N =
S +
I +
E +
R.