1 Introduction
A large number of ecological studies have used climate-based models; two prominent examples are ecological niche models (
Waltari et al., 2007;
Feilhauer et al., 2012) and species distribution models (SDM) (
Anderson, 2012). The United States Geological Survey (USGS) developed for such purposes climate indices, which can be referred to as bioclimatic variables (
O’Donnell and Ignizio, 2012). The bioclimatic variables are widely used in species distribution modeling (Attorre et al., 2007;
Waltari et al., 2014;
Salas et al., 2017). SDMs integrate information on species appearance with environmental features to estimate their distributional range (Vega et al., 2018). SDMs are moreover valuable for other applications across evolutionary ecology and biology (
Title and Bemmels, 2018).
Besides those applications, bioclimatic variables also capture features of climate (
Mesquita and Sousa, 2009) that are directly related to plant physiologic processes determining primary productivity (
Leathwick et al., 2003). The bioclimatic variables represent the types of seasonal trends relevant to the physiologic constraints of different species (
O’Donnell and Ignizio, 2012). Bioclimatic variables also include information on annual conditions, as well as seasonal mean climate conditions and intra-year seasonality (
O’Donnell and Ignizio, 2012;
Fick and Hijmans, 2017). These variables represent annual trends, seasonality, and extreme or limiting environmental factors (
Hijmans et al., 2005). Because of these characteristics, bioclimatic variables are widely used for vegetation mapping (Franklin, 1995;
Hengl et al., 2018), and to study effects of climate change on species distribution for past, current and future scenarios (
Sykes et al., 1996;
Peng, 2000;
Walther et al., 2005;
O’Donnell and Ignizio, 2012), to monitor exotic and invasive species (
Arriaga et al., 2004), for regional planning (
Bryan and Crossman, 2008), ecosystem distribution (
Thompson et al., 2004), and to assess drought risk (
Incerti et al., 2007). At the global level, a set of 19 gridded data sets were developed within WorldClim based on weather stations, involving data from the Global Climate Network Data set (GHCN) (
Lawrimore et al., 2011), the World Meteorological Organization climatological database, and additional minor database-specific weather stations (WMO, 2014). In WorldClim, the bioclimatic variables were derived from two climatic data sources to generate more biologically meaningful variables (
O’Donnell and Ignizio, 2012), which are monthly mean, minimum, and maximum temperature, and monthly total precipitation.
There are two versions of WorldClim bioclimatic variables. WorldClim version 1.4 is a global climate gridded data set for the years 1961–1990 (excluding Antarctica) at 3 resolutions (2.5 min, 5 min, 10 min) (
Hijmans et al., 2005;
Marchi et al., 2019). WorldClim version 2.0 is a new data set containing grids with interpolated data from between 9000 and 60000 weather stations for 4 different spatial resolutions from 30 s (~1 km) to 10 min (~340 km) for the years 1971–2000 (
Fick and Hijmans, 2017). In addition, Michael et al. (
O’Donnell and Ignizio, 2012) developed a set of 20 bioclimatic variables as continuous raster surfaces between 1985 and 2009. Moreover, Vega et al. (2018) reproduced interpolation methods from WorldClim to create MERRAclim, a global set of 19 bioclimatic variables that includes Antarctica. MERRA (Modern Era Retrospective-analysis for Research and Application) is a NASA (National Aeronautics and Space Administration) atmospheric data reanalysis of satellite information. MERRAclim contains three data sets of 19 bioclimatic variables for the years 1980, 1990, 2000, using hourly temperature and humidity data from 1980 to 2000 at three different resolutions (2.5 min, 5 min, 10 min) (Vega et al., 2018).
In parallel, various large scale gridded interpolated temperature and precipitation data sets at different spatiotemporal resolutions have been developed from in situ measurements to estimate bioclimatic variables (
Hijmans et al., 2005;
Fick and Hijmans, 2017;
Vega et al., 2018; Marchi et al., 2019). Unfortunately, in situ measured temperature and precipitation data with long temporal coverage are only available from a limited number of meteorological stations with inadequate spatial coverage (
Otgonbayar et al., 2019). These data sets, therefore, suffer from uneven geographic coverage, with many areas of the Earth poorly represented (
Hijmans et al., 2005).
On the contrary, EO satellites capture the entire Earth surface at much denser ground sampling distances (GSD) and with high temporal revisit frequency (usually 1 day). This data permits estimation of monthly mean, minimum, and maximum surface temperature (
Benali et al., 2012), as well as monthly total precipitation (
Sun et al., 2018). As sensor technology advances at a rapid pace, advanced geo-informatics techniques offer an opportunity to estimate monthly temperature more accurately, and collect precipitation data derived from remote sensing sensors such as multispectral imagery, radio detection ranging (RADAR), and light detection and ranging (Lidar) at different spectral, spatial, and temporal resolutions. For instance,
Fick and Hijmans (2017) determined, satellite data enhanced by 5%–15% prediction quality of temperature variables, especially low spatial density areas. And improving the accuracy of precipitation data, they suggested using satellite-based precipitation data as covariates.
Amiri et al. (2020) estimated 19 bioclimatic variables from temperature and precipitation instrumental records (Model 1), and remote sensing data (Model 2) at a resolution of 1 km during 2001–2017 in Isfahan province of Iran together with three topographic variables using five different regression models. Accuracy statistics in Model 2 was higher than Model 1. This study proved that bioclimatic variables derived from satellite were more effective. Our main goal is to explore alternative ways to improve temporal and spatial resolution of bioclimatic variables derived from remotely sensed data for species distribution modeling. This study specific aim was to estimate bioclimatic variables using time series of land surface temperature (LST) from Moderate Resolution Imaging Spectroradiometer (MODIS), and precipitation (P) from Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) data and to apply the model to the entire land surface of Mongolia. For this analysis, we estimated monthly maximum, mean, and minimum air temperature from Terra MODIS satellite LST (MOD11A2) for period 2002‒2017 using the random forest (RF) regression model and three predictors (
Otgonbayar et al., 2019;
Erdenedalai et al., 2020).
2 Study area
The study area covers the entirety of Mongolia (~1.566 × 10
6 km
2) and is shown in Fig. 1 together with some environmental variables. The maps illustrate the spatial variability of the topography, climate, vegetation, and eco-region conditions. The land surface elevation is between 524 m and 4320 m above sea level (Fig. 1(a)). Of the total territory of Mongolia, 16.3% is<1000 m above sea level, 43.6% 1000‒1500 m, 22.6% 1500‒2500 m, 11.6% 2500‒3500 m, and 5.9%>3500 m. According to the Köppen climate classification (Fig. 1(b)), the Mongolian climate ranges from dry (B) to continental (D). The annual mean temperature (Ta) ranges between
-11.4°C and 9.6°C with strong temperature gradients (Fig. 1(c)). The temperature decreases continuously from the north-west to the south-east. Annual total precipitation ranges between 26 and 50 mm in the semi-desert and desert regions, and between 500 mm to 649 mm in the alpine meadow to tundra regions (Fig. 1(d)). The average above ground biomass as approximated by the normalized difference vegetation index (
NDVI) increases gradually from south to north (Fig. 1(e)). Generally,
NDVI is highly correlated with temperature and precipitation. In the terrestrial ecoregion world map (
Olson et al., 2001), Mongolia is divided into 8 ecoregions from desert to tundra (Fig. 1(f)).
3 Data and method
3.1 Temperature data
Air temperature measurements from station data can in principle be interpolated to derive spatial maps (
Robeson, 1994). However, interpolation errors are often significant, depending on local conditions and the spatial and temporal resolution of measured air temperature data and station density (
Dodson and Marks, 1997). Similar to many other countries, Mongolia’s weather station network for air temperature observations has insufficient spatial coverage. Satellite-derived land surface temperature (LST) data provide continuous spatial and temporal coverage and might, therefore, be used to model the temperature fields. However, satellites only measure land surface temperature (LST) and hence air temperature has to be estimated (
Hooker et al., 2018). A suitable approach was presented in (
Otgonbayar et al., 2019) where monthly maximum, minimum, and average air temperature over Mongolia was estimated using MODIS LST (MOD11A2, v006) time series and the random forest (RF) regression model. MODIS LST was obtained through an online data pool at the National Aeronautics and Space Administration (NASA) Land Processes Distributed Active Archive Centre (LP DAAC). Using the approach presented, we created spatial maps of monthly maximum temperature, minimum temperature (
Erdenedalai et al., 2020), and average temperature (
Otgonbayar et al., 2019) for the period of 2002‒2017, at a spatial resolution of 1 km.
3.2 Precipitation data
Over the past two decades, numerous precipitation products have been generated from gauge-radar and gauge-satellite harmonized precipitation analysis at regional to global levels (
Bai and Liu, 2018;
Li et al., 2013;
Price et al., 2014). Detailed information of the precipitation database combining rain gauge, satellite, and reanalysis products can be found for instance in
Roca et al. (2019) and
Beck et al. (2017). Here, we used two combined gauge-satellite data sets with a fine spatial resolution: CHIRPS and PERSIANN-CCS (Table 1). To select the most appropriate precipitation data, we compared CHIRPS and PERSIANN-CCS (see Appendix 1 for details). According to our findings (Table A1, Fig. A1), the CHIRPS data was overall far more accurate than PERSIANN-CCS. The main limitation of the CHIRPS data was its limited spatial coverage. Indeed, CHIRPS only covers the area 50°S–50°N whereas the northern part of Mongolia goes up to 52°N. To generate a gap-free wall-to-wall map for the entire territory of Mongolia, we filled the part of Mongolia between 50°N and 52°N with data from the Climate Hazards Center’s Precipitation Climatology data version 1.0.
CHIRPS monthly total precipitation data sets were obtained from the Climate Hazard Center website with a spatial resolution 0.05°, spatial coverage 87°E‒120E°, 41°N‒50°N, and covering the period between 2002 and 2017. These data sets were developed in collaboration with scientists at the USGS Earth Resources Observation and Science (EROS), supported by the United States Agency for International Development’s (USAID) Famine Early Warming Systems Network (FEWS NET). The data sets are built on ‘smart’ interpolation techniques, estimates focused on infrared Cold Cloud Duration (CCD) observations that are available in GeoTIFF, NetCDF, and BIL formats. The unit is mm per period, including mm per day, pentad, and month (
Funk et al. 2015).
Here, we compared between derived from satellite (MODIS and CHRIPS) for period 2002–2017 and the weather station-based WorldClim data sets. WorldClim data sets include grids interpolated in situ station data for the 1970–2002 time period (Fig. 2). Estimated monthly maximum, mean, and minimum temperatures derived from MODIS LST are highly correlated with World Climatic temperature data sets compare to precipitation data sets (Table 2).
3.3 Methods
To calculate 19 bioclimatic variables at 1 km spatial resolution we used the functions listed in Table 3. The functions used as inputs satellite-derived air temperature (monthly maximum, monthly average, monthly minimum), and monthly total precipitation. Descriptive statistics for the 4 variables are provided in Table A2 and Fig. A2 (Appendix 2, Appendix 3). All calculations were done in R for statistical computing and graphics (
Ripley 2001), and System for Automated Geoscientific Analyses (SAGA GIS) for analysis of spatial data (SAGA G, 2013). The analysis used the ‘biovars’ function of the ‘dismo’ package in R. To test the differences between our set of bioclimatic variables (“SatClim”) and WorldClim bioclimatic variables, we used coefficient of determination (
R2), root mean squared error (
RMSE), and normalized root mean squared error (
nRMSE) as described in Table 4.
4 Results
In Table 5, descriptive statistics of the estimated 19 bioclimatic variables are reported. Spatial maps of 19 bioclimatic variables at a spatial resolution of 1 km for the period 2002–2017 are shown in Fig. 3. In this figure we also provide a comparison between the estimated bioclimatic variables (SatClim) and WorldClim version 2. Results of the statistical analysis are reported in Table 6.
For almost all of the 19 SatClim and WorldClim bioclimatic variables, high correlations (R2≥0.70) were revealed in the linear regression between the>1.5 × 106 pairs of values. Only for the annual mean diurnal range (02) and Isothermality (03) we found lower – but still moderate – correlations (R2≈ 0.40–0.46). Only for these two variables the normalized RMSE (nRMSE) slightly exceeded 10%. The nRMSE of the remaining 17 bioclimatic variables were all below 8% (with six variables nRMSE<4%). The seven precipitation-related bioclimatic variables were generally more closely correlated with WorldClim compared to the 11 temperature-related bioclimatic variables (Table 6). Examining the consistency of retrieved frequency distribution (WorldClim versus SatClim), we found generally a very similar pattern, often characterized by multi-modal distributions (Fig. 3, last column).
Together, our results demonstrate that the spatial pattern, value ranges, and frequency distributions of WorldClim were generally well retrieved using the satellite derived inputs of SatClim. For the two variables annual mean diurnal range and isothermality, the lower correlations can be attributed to the fact that temperature extremes enter in the calculations; variables that are generally less well retrieved using satellite-based modeling techniques.
5 Discussion
Bioclimatic variables show information about annual conditions (e.g., 01, 02, 07, and 12), seasonal variations (e.g., 05, 06, 13, 14), and intra-year seasonality (e.g., 08–11, 16–19). These variables are represented as indicators relevant to the physiologic restrictions of species and are valuable for a number of applications (
O’Donnell and Ignizio, 2012).
In recent years, at global level, bioclimatic variables mostly have been estimated from two commonly used types of data sets, namely WorldClim data sets (
Fick and Hijmans, 2017;
Marchi et al., 2019), and MERRAclim data sets (Vega et al., 2018). WorldClim version 1 and version 2 are global gridded data sets at a spatial resolution of ~1 km
2. WorldClim data sets are representative of the time period 1961–1990, and 1970–2000, respectively (
Fick and Hijmans, 2017). WorldClim climate data sets, and bioclimatic variables are produced by geo-statistical interpolation methods (i.e., kriging and spline).
MERRAclim bioclimatic variables estimated from MERRAclim data sets, which are produced from station-based hourly data of air temperature, and specific humidity gridded data (instead of precipitation) from the Modern Era Retrospective Analysis for Research and Applications Reanalysis (MERRA) using a spline interpolation method for the years 1980, 1990, and 2000 (Vega et al., 2017). Therefore, MERRA data set is a climate reanalysis data set focused on weather station and modern remote sensing data. The disadvantage of MERRAclim bioclimatic variables with a coarse spatial resolution (10 arc-minutes, 5 arc-minutes, and 2.5 arc-minutes).
Moreover, several studies (
Waltari et al., 2014;
Brown and Comrie, 2002;
Kurtzman and Kadmon, 1999;
Nikolova and Vassilev, 2006) using various interpolation methods including kriging (co, simple, and ordinary), thin plate smoothing splines, and inverse distance weighting (IDW) to simultaneously on precipitation and temperature data sets had different level of success, and generally revealed larger errors for precipitation as compared to temperature (
Mesquita and Sousa, 2009). Moreover, uncertainty of the interpolation-based method was increasing the time and asymmetry difference between future study and interpolation-based climate data sets (e.g., WorldClim) might lead to unsuitable predictions (
Amiri et al., 2020). Conversely, satellite derived data are continuous in spatial and temporal coverage. Moreover, real-time access to satellite data have led to more up-to-date climate data (
Amiri et al., 2020).
To remedy these limitations, we estimated bioclimatic variables (SatClim) using MODIS LST and CHIRPS data for the years 2002‒2017. For this analysis, we estimated monthly maximum, mean, and minimum air temperature from Terra MODIS satellite LST (MOD11A2) for period 2002‒2017 using the random forest (RF) regression model and three predictors (
Otgonbayar et al., 2019;
Erdenedalai et al., 2020). We examined the relationship between SatClim and WorldClim bioclimatic variables version 2.0 for the entire territory of Mongolia using the coefficient of determination (
R2), root mean squared error (
RMSE), and normalized root mean squared error (
nRMSE), which represent spatial correlation (association) and error (residual) (
Richter et al., 2012).
In general, and considering WorldClim as a “reference”, the spatial pattern of all 19 bioclimatic variables were well retrieved from MODIS and CHIRPS data and had moderate to highly positive correlations, with similar (often multi-modal) frequency distributions. The lower performance of the two variables annual mean diurnal range (02) and Isothermality (03) can be attributed to the fact that temperature extremes enter into their calculation. These temperature extremes are often underestimated using satellite-derived input data (
Janatian et al., 2017;
Duan et al., 2018;
Hooker et al., 2018). Other 17 variables were estimated with normalized
RMSE of<8% with six of the 17 variables
nRMSE<4% (Table 6).
Amiri et al. (2020) estimated 19 bioclimatic variables from temperature and precipitation instrumental records (Model 1), and remote sensing data (Model 2) at a resolution of 1 km during 2001–2017 in Isfahan Province of Iran together with three topographic variables using five different regression models. Accuracy statistics in Model 2 was higher than Model 1. This study proved that bioclimatic variables derived from satellite were more effective.
The success of our satellite-derived method can be attributed to the fact that precipitation and temperature can be relatively well retrieved remotely (Kidd et al., 2010;
Li et al., 2013;
Funk et al., 2015;
Beck et al., 2017;
Paredes-Trejo et al., 2017;
Sun et al., 2018), and especially in highly elevated or mountainous areas (
Fick and Hijmans, 2017). In those areas, spatially and temporally continuous grids of land surface temperature (LST) are valuable inputs for accurate and robust air temperature retrievals with monthly resolution (
Otgonbayar et al., 2019). In a similar way, by observing cloud top temperatures, it is possible to model monthly precipitation fields with relatively high accuracy (
Bai and Liu, 2018). Without Earth Observation (EO) data, these primary variables have to be modeled and/or interpolated from sparse station data, often not capturing well local peculiarities (
Vancutsem et al., 2010;
Benali et al., 2012;
Atzberger and Rembold, 2013).
6 Conclusions
Spatial maps of 19 bioclimatic variables at a spatial resolution of 1 km were generated for the entire territory of Mongolia, representing the period 2002–2017. The analysis used two different satellite time series data: MODerate Resolution Imaging Spectroradiometer (MODIS) land surface temperature (LST), and Climate Hazards Group Infrared Precipitation with Stations (CHIRPS). To estimate monthly maximum, mean, and minimum air temperature the random forest regression model was used with time series of LST (from Terra MODIS satellite collection 6) as a predictor variable. Monthly total precipitation data was obtained from CHIRPS version 2.0.
Seventeen bioclimatic variables derived from MODIS and CHIRPS data had a strong positive correlation with the WorldClim bioclimatic variables, and their frequency distributions were close. Two variables were the lower performance as annual mean diurnal range (02) and Isothermality (03) can be attributed to the fact that temperature extremes enter into their calculation. These temperature extremes are underestimated applying satellite-derived input data (
Janatian et al., 2017;
Duan et al., 2018;
Hooker et al., 2018). As a consequence of the successful retrieval of the bioclimatic variables, we are confident that the estimated 19 bioclimatic variables will be very useful for a range of applications, in particular, if a higher spatial resolution is required such as for species distribution modeling.
The success of the modeling can be attributed to the fact that climatologies of both air temperature as well as precipitation can be well retrieved from EO data, in particular, if aggregated over monthly intervals and for regions such as Mongolia. In areas with sparse station density, EO data avoids otherwise necessary interpolation techniques.
The main limitation of many EO products relates to the fact that data sets are still relatively short (e.g. MODIS LST starting only in 2002) and that data from multiple satellites would have to be combined and normalized if longer time series are required. The advantage of the MODIS data set is, however, that it covers the most recent 15 years. For the future, spatial and temporal resolution, as well as spatial coverage, will favor EO data even more compared to other techniques as new satellites are launched at an unprecedented pace. For future research, we recommend to focusing on the improved quality, spatial, and temporal resolution of precipitation estimates.