1. School of Environment and Engineering, Anhui Jianzhu University, Hefei 230022, China
2. School of Earth Sciences and Engineering, Hohai University, Nanjing 211100, China
zhaohui.xue@hhu.edu.cn
Show less
History+
Received
Accepted
Published
2016-12-16
2018-04-15
2019-01-25
Issue Date
Revised Date
2018-07-25
PDF
(12882KB)
Abstract
Phenology has become a good indicator for illustrating the long-term changes in the natural resources of the Yangtze River Delta. However, two issues can be observed from previous studies. On the one hand, existing time-series classification methods mainly using a single classifier, the discrimination power, can become deteriorated due to fluctuations characterizing the time series. On the other hand, previous work on the Yangtze River Delta was limited in the spatial domain (usually to 16 cities) and in the temporal domain (usually 2000–2010). To address these issues, this study attempts to analyze the spatio-temporal variation in phenology in the Yangtze River Delta (with 26 cities, enlarged by the state council in June 2016), facilitated by classifying the land cover types and extracting the phenological metrics based on Moderate Resolution Imaging Spectrometer (MODIS) Normalized Difference Vegetation Index (NDVI) time series collected from 2001 to 2015. First, ensemble learning (EL)-based classifiers are used for land cover classification, where the training samples (a total of 201,597) derived from visual interpretation based on GlobelLand30 are further screened using vertex component analysis (VCA), resulting in 600 samples for training and the remainder for validating. Then, eleven phenological metrics are extracted by TIMESAT (a package name) based on the time series, where a seasonal-trend decomposition procedure based on loess (STL-decomposition) is used to remove spikes and a Savitzky-Golay filter is used for filtering. Finally, the spatio-temporal phenology variation is analyzed by considering the classification maps and the phenological metrics. The experimental results indicate that: 1) random forest (RF) obtains the most accurate classification map (with an overall accuracy higher than 96%); 2) different land cover types illustrate the various seasonalities; 3) the Yangtze River Delta has two obvious regions, i.e., the north and the south parts, resulting from different rainfall, temperature, and ecosystem conditions; 4) the phenology variation over time is not significant in the study area; 5) the correlation between gross spring greenness (GSG) and gross primary productivity (GPP) is very high, indicating the potential use of GSG for assessing the carbon flux.
Yongfeng WANG, Zhaohui XUE, Jun CHEN, Guangzhou CHEN.
Spatio-temporal analysis of phenology in Yangtze River Delta based on MODIS NDVI time series from 2001 to 2015.
Front. Earth Sci., 2019, 13(1): 92-110 DOI:10.1007/s11707-018-0713-0
Remote sensing time series of medium spatial resolution optical data have demonstrated a high capacity for providing plenty of useful land cover information both in the spatial and temporal domains, leading to a better understanding of regional ecosystem dynamics. Moderate Resolution Imaging Spectrometer (MODIS) vegetation time series data have been used for biomass estimation (Anderson et al., 2016; Zhang et al., 2016), crop yield estimation (Zhang and Zhang, 2016), biophysical character analysis (Ghosh et al., 2016), land cover mapping (Gómez et al., 2016; Shao et al., 2016), and phenology analysis (Verbesselt et al., 2010; Marston et al., 2016; Verger et al., 2016; Zeng et al., 2016; Zhou et al., 2016). Great efforts have been made in land cover classification and phenology analysis using MODIS vegetation time-series data. Land cover classification is essential for managing natural resources, modeling environmental variables, and understanding habitats (Gómez et al., 2016), which has been enhanced by various methods. For example, Demir et al. (2013) have proposed a novel change-detection-driven transfer learning (TL) approach to update land-cover maps by classifying time series data. A hidden Markov model (HMM) was used to distinguish real land cover changes from spurious land cover changes (Abercrombie and Friedl, 2016). Multilabel classification, a powerful framework in machine learning, was used for inferring the complex relationships between time series and spectral profiles of different types of surface materials (Karalas et al., 2016). Smoothing algorithms were applied to MODIS NDVI time-series data to denoise data used for land-cover classifications across the Laurentian Great Lakes Basin (Shao et al., 2016). MultiLayer Perceptron (MLP) considering the Abundance Non-negativity Constraint and the Abundance Sum-to-one Constraint was proposed for the multi-class sub-pixel land cover classification of a time series of low-resolution MODIS images covering the northern part of Belgium (Heremans et al., 2016). Support vector machines (SVMs) were adopted to address the space-time distribution of the dominant vegetation land cover types in Iraq, particularly croplands, based on MODIS NDVI from 2002 to 2012 (Qader et al., 2016). A one-class SVM was also used to detect different rice- growth signatures and classify paddy rice areas in continental China (Clauss et al., 2016). A Dynamic Time Warping distance-based similarity measure approach was used to reduce the inaccuracy of classification using a single image (Guan et al., 2016). Wohlfart et al. (2016) have derived land cover characteristics and dynamics from the last decade based on optical high-temporal MODIS NDVI time series for the whole of Yellow River Basin. Zhu et al. (2016) have proposed a new approach to map fractional cropland distributions in Mato Grosso, Brazil, using time series of MODIS enhanced vegetation index and Landsat Thematic Mapper data.
Time-series data have demonstrated a high capacity for land cover classification, where machine learning methods (e.g., TL, HMM, multilabel classification, MLP, and SVM) serve as engines for promoting progress in this community. Although high performances have been obtained by the aforementioned classifiers, the discrimination power may be deteriorated due to the fluctuations of time series when using a single classifier.
In this context, ensemble learning may be more effective for the classification of time-series data. Ensemble learning was designed to combine results from many weak learners into one ensemble predictor, aiming at improving classification accuracy and stability (Zhang and Ma, 2012). Generating such an effective ensemble lies in the core of this paradigm, where the base learner, an ensemble of weaker learners, and the combination method commonly different from various ensemble learning methods. Bagging, Random Forest, and Rotation Forest, respectively, focus on training data sampling, feature selection, and feature extraction to generate the associated ensemble (Du et al., 2012). Previous work have validated the good performance of these methods for hyperspectral remote sensing image classification (Xia et al., 2014, 2015; Xue et al., 2015). Our previous work adopted ensemble learning for phenology-driven land cover classification based on time-series (Xue et al., 2014a).
Phenology analysis focuses on land cover seasonal trajectory with three major characteristics (Verbesselt et al., 2010; Xue et al., 2014a): 1) Phenological features, i.e., a significant change in seasonal shape. For inter-annual periods, this change appears as phenological metric (e.g., start of season, SOS) changes originating from short- term climate fluctuations (e.g., temperature and rainfall). For a long period, this change manifests as annual phenological shifts caused by long-term climate changes or large-scale anthropogenic disturbances (e.g., urbanization). 2) An abrupt change, i.e., a step change driven by temporal disturbances (e.g., deforestation, floods, fires or sensor errors). 3) A gradual trend, i.e., a linear trend triggered by a gentle change in seasonality (e.g., land degradation or a long-term change in rainfall).
Several methods for deriving or analyzing phenological metrics have been proposed. For example, Chen et al. (2016) have proposed a simple method called weighted cross-correlogram spectral matching-phenology (CCSM-P), which combines CCSM and a weighted correlation system, for detecting vegetation phenological changes using multiyear vegetation index time series. A simple smoother without any local adjustments based on a Continuous Wavelet Transform was proposed for deriving phenological metrics based on MODIS NDVI time-series data (Qiu et al., 2016). In the work performed by Zhao et al. (2016), the land surface phenological metrics were calculated using the third generation dataset from the Global Inventory Modeling and Mapping Studies (GIMMS 3 g) for 1982 to 2013. The metrics were then estimated to analyze the variations in the land surface phenology in Northeast China at different scales and to discuss the internal relationships between phenology and climate change. Zhou et al. (2016) have investigated the trends of the phenological metrics (i.e., start, end, and length of growing season: SOS, EOS and GSL, respectively) of individual cities and across cities relative to rural areas for the 32 major cities of China using MODIS data between 2007 and 2013.
Developments in phenology analysis focus on phenological metric extraction, phenology change detection, the relationships between phenology and climate change, phenology trend investigation, the relationships between phenology and urbanization, and the relationships between LSP and LST. Phenology has become a good indicator for illustrating the long-term phenology changes.
China has experienced a rapid urban expansion over the past three decades because of its accelerated economic growth, leading to several typical urban agglomerations, especially at the Yangtze River Delta. Few studies of land cover change have been conducted on the Yangtze River Delta based on MODIS vegetation time-series data. For example, Zhao et al. (2009) have investigated the spatio-temporal variations (2001‒2006) of vegetation cover at Dongtan, Chongming Island, in the Yangtze River estuary, China, to assess its rapid vegetation succession and physical conditions. Wei et al. (2012) have used the TIMESAT program to quantify annual vegetation production and trends based on Advanced Very High Resolution Radiometer NDVI imagery obtained from 1982 to 2006 over China, including the Pearl River Delta, the Yangtze River Delta, and areas along the Yangtze River. Han and Xu (2013) have used the SPOT/VGT NDVI time-series images (2002‒2009) and MODIS land surface temperature (LST) images (2002‒2009) smoothed by a Savitzky-Golay filter to extract the land surface phenology (LSP) and LST, respectively, for six cities in the Yangtze River Delta, China. Li et al. (2015) have used MODIS observations to describe the changes in land surface characteristics, including land-cover type, green vegetation fraction, and leaf area index, with a weather research and forecasting model. Shi and Huang (2015) have proposed a Normalized Weighted Difference Water Index to identify the unique characteristics of rice during the 2000‒2010 period in the Yangtze River Delta.
However, studies in the Yangtze River Delta were conducted for the period of 2000‒2010 based on the former boundary composed of 16 cities. Given the background, this is the first step in the spatio-temporal phenology analysis of the Yangtze River Delta based on MODIS NDVI time series, particularly for the period of 2001‒ 2015 and with an enlarged boundary covering the 26 cities indicated by the state council in June 2016.
Nevertheless, this work focused on the two aforementioned issues by analyzing the spatio-temporal variation of phenology in the Yangtze River Delta based on the MODIS NDVI time-series data for the period of 2001‒2015, which was motivated by classifying the land cover types and extracting the phenological metrics.
First, we compared different EL-based classifiers with other traditional machine learning-based classifiers for land cover classification, where the training samples derived from visual interpretation were pured using vertex component analysis (VCA) (Nascimento and Dias, 2005). Second, eleven phenological metrics were extracted for land cover phenology analysis based on the time series using a seasonal-trend decomposition procedure based on loess (STL-decomposition) for spike removal and filtered with a Savitzky-Golay filter. Finally, spatio-temporal phenology variations were analyzed by considering the classification maps and the phenological metrics. Noteworthy, each land cover type has a specific phenological pattern. Consequently, land cover classification should be conducted first before analyzing the phenology since one cannot analyze the spatial or temporal phenological variations without the identification of different phenological patterns with respect to different land cover types.
The major novelty and contributions of this study are:
– EL-based classifiers are applied to classify long-term MODIS NDVI time-series data, which can alleviate the effects of signal contamination, thus possibly leading to more accurate classification results.
– Spatio-temporal analysis of phenology in the Yangtze River Delta with an enlarged boundary during the period of 2001‒2015 is conducted. The analysis is found to be more significant and reliable since the research period covers the past 15 years and the spatial domain considers the latest boundary.
– Gross spring greenness is highly related to GPP, which indicates the further potential of using GSG for the assessment of carbon flux.
Study area and data
Study area
The study area is located in the Yangtze River Delta bounded by the East China Sea. The state council passed the “Urban Agglomeration Development Plan of Yangtze River Delta” (the Plan for short) in June 2016. The Plan stipulated that the urban agglomeration of the Yangtze River Delta consists of Shanghai City, Jiangsu, Zhejiang, and Anhui provinces, where Shanghai lies at the core. As shown in Fig. 1, the study area consists of 26 cities: Shanghai; nine cities from Jiangsu province, i.e., Nanjing, Wuxi, Suzhou, Nantong, Yancheng, Yangzhou, Zhenjiang, and Taizhou; eight cities from Zhejiang province, i.e., Hangzhou, Ningbo, Jiaxing, Huzhou, Shaoxing, Jinhua, Zhoushan, and Taizhou; and eight cities from Anhui province, i.e., Hefei, Wuhu, Maanshan, Tongling, Anqing, Chuzhou, Chizhou, and Xuancheng. This covers an area of 21.17 km2, only occupying 2.2% of the total national land area. However, this area produces 12.67 billion CNY of gross domestic product (GDP) according to the statistics obtained in 2014, accounting for nearly 18.5% of the total national GDP (according to “The Yangtze River Delta urban agglomeration development plan” released by the state council in June 2016). Yangtze River Delta is characterized by its high energy, greatest openness, most powerful innovation, and highest input population, making it the most important intersection region in the Yangtze River economic zone.
MODIS data acquisition and pre-processing
The performance of the methodology is demonstrated on 15-year time series (from 2001 to 2015) of 250-meter 16-day composite NDVI datasets MOD13Q1 collected by MODIS. The MODIS NDVI is generated from the MODIS level 2 (L2G) daily surface reflectance products (MOD09 series), which can provide red and near-infrared surface data (Fensholt and Proud, 2012). The MOD13Q1 product has been atmospherically corrected by masking water, clouds, heavy aerosols, and cloud shadows, which have been considered for detecting the phenological response of vegetation (Wardlow et al., 2007). A constrained view-maximum value composite method based on the daily reference data was used to generate the products (Huete et al., 2002). The MODIS NDVI data product is in the Hierarchical Data Format Earth Observation System (HDF-EOS) referenced to a global sinusoidal tiling system with 10° latitude by 10° longitude without overlapping. The study area is covered by four adjacent tiles indexed by “h27v05”, “h27v06”, “h28v05”, and “h28v06”. Using the MODIS reprojection tool, a “*.bat” file is developed to clip, mosaic, re-project, re-sample, and convert the MODIS products into GeoTIFF files in a batch processing mode. All the products are projected onto the Universal Transverse Mercator 50N zone along with a WGS 1984 coordinate system, which is considered as the standard projection and coordinate system throughout our study.
Reference data collection
The reference data are obtained by visual interpretation according to GlobelLand30 (Chen et al., 2014). GlobelLand30 is the world’s first 30 m resolution land cover dataset combining land resource information, HJ-1, and Landsat images. A hierarchical classification strategy was proposed for GlobalLand30, where land cover type extraction, automatic classification using eco-geographical zone data and other reference data, and interactive editing guided by expert knowledge are combined effectively to achieve high-precision datasets. Therefore, GlobalLand30 was applied with multiple data sources and a complex classification system to control the accuracy. In addition, the reference data were selected by visual interpretation based on GlobalLand30 and were downscaled to 250 m to accommodate the MOD13Q1 product using a nearest neighbor interpolation method for resampling. In this context, the inconsistencies and mistakes of visual interpretation based on GlobalLand30 are greatly reduced. Figure 2(a) depicts the land cover map generated by GlobelLand30, and Figure 2(b) shows the region of interest chosen through visual interpretation. The study area includes six land cover types, i.e., cultivated land, forest, grassland, wetland, water bodies, and artificial surfaces, with a total of 201,597 samples.
Methodology
The methodology for the spatio-temporal analysis of phenology in the Yangtze River Delta based on MODIS NDVI time series from 2001 to 2015 is illustrated in detail in Fig. 3 and described with the multiple processing steps given below.
Land cover sample screening
The validation samples (a total of 201,597) are selected by visual interpretation based on GlobeLand30 covering the same area. To reduce the effects caused by temporal and human factors during sample selection, all the interpreted samples are screened using vertex component analysis (VCA) (Nascimento and Dias, 2005). The algorithm exploits two facts: 1) the endmembers are the vertices of a simplex, and 2) the affine transformation of a simplex is also a simplex. The algorithm assumes the presence of pure pixels in the data and iteratively projects data onto a direction orthogonal to the subspace spanned by the endmembers already determined. The new endmember signature corresponds to the extreme of the projection. The algorithm iterates until the number of endmembers is exhausted.
VCA is unsupervised and is based on the geometry of convex sets, which exploits the fact that endmembers occupy the vertices of a simplex, as shown in Fig. 4. VCA performs better than pixel purity index (PPI) and better than or similar to N-finder algorithm (N-FINDR). VCA has, however, the lowest computational complexity among these three algorithms. Savings in terms of computational complexity ranges between one and two orders of magnitude. We adopt VCA to generate more pure samples (100 per class) that can well represent the specific land cover type.
Time-series filtering
STL-decomposition for spike removal
Spikes and outliers can be found in many time series, and it is important to focus on pre-processing since remaining spikes and outliers may seriously degrade the final function fits (Eklundh and Jönsson, 2015). Time-series data can be decomposed into season, trend, and remainder components using the seasonal-trend decomposition procedure based on loess (STL-decomposition) (Verbesselt et al., 2010) given as follows:
where Yt are the observed data and St, Tt, and et are, respectively, the season, trend, and remainder components at time t.
Assume that the seasonal breakpoints have been given as t1# , ...,tp#, and define t0# = 0, tp+1#=n. Thus, St can be represented by a harmonic model with K terms:
where aj,k, dj,k, and f are the segment-specific amplitude, phase, and frequency (e.g., f = 23 for an annual observation of time series with a 16-day interval), respectively. The harmonic term is k, the maximum harmonic is K=3, that is, k = 1, 2, 3.
Suppose that Tt is piecewise linear with specific slopes and intersects on m + 1 segments broken by m breakpoints, t1, ..., tm . Then, Tt is of the form
where i = 1, ..., m, t0 = 0, and tm +1= n.
The decomposition model illustrated in Eq. (1) is not so straightforward due to the unknown season and trend breakpoints, along with aj,k and bj,k for St and ai and bi for Tt. Therefore, an ordinary least squares residual-based moving sum test is often used to test whether the breakpoints occur. St and Tt are finally obtained in an iterative fitting model.
Data values that do not fit the pattern under Eq. (1) are assigned small weights. The spikes can then be detected when the weight is less than a pre-defined threshold (i.e., 1e‒8). To remove those spikes, the time series are assigned weights that are products of the weights from the STL-decomposition.
Savitzky-Golay filter
One way of smoothing data and suppressing disturbances is to use a filter and replace each data value yi, i = 1, ..., N , by a linear combination of nearby values in a window (Chen et al., 2004)
where the weights are cj = 1/(2n + 1) and the data value yi is replaced by the averaged value within the window. The moving-average method preserves the area and mean position of a seasonal peak, but it alters both the width and height. The latter properties can be preserved by approximating the underlying data value with the value obtained from a least-squares fitting to a polynomial of degree.
Ensemble learning-based classification
Ensemble learning is designed to combine results from many weak learners into one ensemble predictor, aiming at improving classification accuracy and stability. The multiple classifiers can be designed independently without any mutual interaction, and their outputs are combined according to a given strategy. Generating such an effective ensemble lies at the core of this paradigm, where the base learner, an ensemble of weaker learners, and the combination strategy commonly differ between various ensemble learning methods. Bagging, Random Forest, and Rotation Forest focus on training data sampling, feature selection, and feature extraction, respectively, to generate the associated ensemble of classifiers. Figure 5 depicts the general framework for Bagging, RoF, RS, and RF, where the ensemble size is set to 10, and majority vote is used to combine the results of different classifiers.
To avoid biased evaluation originating from training data initialization, we applied VCA to choose more reliable samples for each classifier. In each run, 100 labeled samples per class were used for training, and the remaining 200,997 samples were used for validating. We used limited training samples on purpose to test the generalization performance of the different classifiers.
The performance of different classifiers was compared in terms of class-specific accuracy, overall accuracy (OA), average accuracy (AA), and the Kappa coefficient (k). The differences between classifiers were evaluated using the Kappa and McNemar z-score statistical tests (Foody, 2004). The critical value of the z-score is 1.96 at a confidence level of 0.95, which means when z-score value between two classification methods is higher than 1.96, they have significant difference.
Phenological metric extraction
Phenology patterns can be identified by critical points, lines, and areas called phenological metrics in the seasonal trajectory. For example, the start of season (SOS), which means the timing of half maximum during vegetation growth, is one of the most commonly used metrics (Hmimina et al., 2013). Moreover, certain other metrics, such as the end of season (EOS), length of season (LOS), base level of season (BOS), timing of mid-season (TOMS), peak value of season (POS), amplitude of sea- son (AOS), rate of grow-up (ROG), rate of senescence (ROS), gross spring greenness (GSG), and net spring greenness (NSG), were also included in our study based on the fitted time series by TIMESAT. A graphical illustration of the phenological metrics is shown in Fig. 6. Additional details about the definitions of those phenological metrics can be found in Table 1.
Experimental results
Experimental settings
The adopted packages, platform and parameter settings for the experiments are introduced as follows:
– Ensemble learning-based classifiers, including Bagging (Breiman, 1996), Rotation Forest (RoF) (Rodriguez et al., 2006), Random Subspace (RS) (Ho, 1998), and Random Forest (RF) (Breiman, 2001) were compared with other traditional classifiers, i.e., k-nearest neighbors (kNN) (Cover and Hart, 1967), SVM (Cortes and Vapnik, 1995), MLR via variable splitting and augmented Lagrangian (LORSAL) (Li et al., 2011), and sparse representation based classification (SRC) (Wright et al., 2009). The ensemble sizes of the ensemble learning methods were experientially set to 10. For kNN, the neighbor size was set to 3. For SVM, a particle swarm optimization (PSO) (Xue et al., 2014b) strategy was adopted to optimize the parameters used in SVM. For LORSAL, we adopted the recommended parameters. As for SRC, the sparsity level was set to 5. These experimental settings may be suboptimal, but they have produced good results. For each land cover type, VCA was adopted to select the training samples, and the remaining samples were used for validation.
– TIMESAT (Jönsson and Eklundh, 2002) was used to extract the phenological metrics, where STL decomposition was used to remove the spikes and Savitzky-Golay used to fit the time series. The moving window size was set to 4, and the adaptation strength was set to 2. The default number of the inter-annual season was set to 1. Noteworthy, TIMESAT extracts seasonality data only for the n‒1 center-most seasons with a period of n years in the time series, which means that it produces only 14 seasons when inputting 15 years of data.
– The experiments were conducted using Matlab R2015b (64-bit) on a desktop PC equipped with an Intel Xeon E3 CPU (at 3.4 GHz) and 32 GB of RAM.
Land cover classification
We first analyzed the impact of the training set size on the classification accuracy. Figure 7 illustrates the evolution of OA with the number of labeled samples per class. Note that SVM did not perform very well due to the ill-posed classification problem. Therefore, we did not implement SVM in this test. The OAs obtained by the different methods increase as the training set size increases (Fig. 7). EL-based methods, i.e., Bagging, RoF, FS, and RF, obtained higher accuracies than do LORSAL and SRC. Furthermore, LORSAL did not perform very well when the number of labeled samples per class was lower than 60. Another observation was that different EL methods yielded similar results. However, RF significantly outperformed the other methods. Interestingly, kNN yielded a competitive classification performance.
The classification accuracies and the statistical test results for these classifiers are reported in Table 2 with 100 labeled samples per class. Obviously, RF produced the highest OA (96.64%), AA (96.25%), and k (0.954). Bagging generated a competitive classification accuracy with a slightly lower AA (95.30%) compared to RF.
The results of the Kappa and McNemar z-score statistical tests conducted for between-classifier indicated that RF achieved a significantly better classification result compared to other traditional classifiers, which was supported by the high z-score values listed in Table 3. It can be observed that all the tests were significant at 95% confidence level, except for Bagging/RF. The lower Kappa and McNemar z-score values between Bagging and RF illustrate that the two methods generated more similar classification results than other pairs, which agreed with the results listed in Table 2. The results imply that the generalization performance of Bagging and RF are similar for time-series data. It is worth noting that we were not focused on testing which ensemble learning method would produce the best result; instead, we attempted to demonstrate the effectiveness of ensemble learning for time-series data classification compared to other traditional classification methods.
Figure 8 depicts the classification maps obtained by different methods with 100 labeled samples per class. The differences between the classification maps can be greatly appreciated when classifying artificial surfaces. This is due to the fact that the time series of artificial surfaces are unstable and usually affected by more disturbances since the anthropogenic activities in this region are obvious. As expected, Bagging and RF obtain better classification maps. The classification maps for the other ensemble-based classifiers were similar. Cultivated land is located in the north part of the Yangtze River Delta. Forest is distributed in the south part of the Yangtze River Delta. Grassland is not so clear from the classification maps since they are too small. Wetland and water bodies are clearly classified along the Yangtze River. Artificial surfaces are located in the east coast of the ocean.
Phenological metrics extraction
Due to the specific ecosystem of the Yangtze River Delta, several typical phenological patterns occur in this area. The details of the extracted phenological metrics are reported in Table 4. To visually inspect the effects of time-series filtering and phenological metric extraction, the average annual phenologies of the investigated land cover types are illustrated clearly in Fig. 9 after averaging the results of 100 samples.
By comparing the “Raw” and “Fitted” curves in Fig. 9, we can observe that the Savitzky-Golay method with STL-decomposition-based spike removal can effectively filter the time-series data. To illustrate the performance of phenological metric extraction, SOS and EOS are indicated in the curves. According to the figure, we conclude the following:
– Cultivated land is common in many parts along the Yangtze River due to the rich amounts of rainfall. The associated phenology usually has two or more seasons in one calendar year due to the regular farming activities between seasons. In addition, the time series have more disturbances (abrupt changes) triggered by human intervention, and those changes are valuable since they always indicate the timing of urbanization occurring simultaneously.
– Forest is distributed in the south part of the Yangtze River, which is characterized by strong photosynthetic activity. Thus, its inter-annual NDVI value is always high, with a relatively early SOS and later EOS, between which there appears a tiny valley at the middle of the year. It presents extremely low inter-annual variability without a clear peak value due to its insensitivity to rainfall and temperature changes. Typical regions include several parks, e.g., the Sun Yat-Sen Mausoleum in Nanjing.
– Grassland is rare in this region due to the lower spatial resolution of the MODIS dataset. However, the data present typical inter-annual variability in response to cumulative rainfall, resulting in one clear biomass peak and valley. Similar to grassland, wetland also has typical inter-annual variability. However, it is more stable since it is insensitive to urbanization.
– The phenology trajectories of water bodies and artificial surfaces are not illustrated in detail here since we focus on vegetation. However, they still can be classified using the corresponding time series even if they have no phenology characteristics.
Discussion
Analysis of different classifiers
First, kNN assigns an object with the label that is most common among its k nearest neighbors, also known as majority voting. Some drawbacks of the kNN classifier are as follows: 1) A more frequent class tends to dominate the prediction of any unknown sample when the class distribution is skewed, which is also the weakness of majority voting. 2) The performance of kNN is sensitive to the parameter k, which also can be severely degraded by the presence of noise. 3) This method is computational intensive for large amounts of training data; thus, an appropriate nearest neighbor search algorithm should be adopted.
Second, SVM is a non-parametric statistical learning-based classifier that finds a hyperplane determined by a few support vectors learned from the training data and separates the unknown data based on these support vectors. SVM appears to be especially advantageous in the presence of heterogeneous classes with only a small amount of training data available. A major drawback concerning the use of SVM for high-dimensional data classification is the choice of kernel and the setting of other parameters, which can substantially affect the classification performance.
Third, Bagging, Random Forest, and Rotation Forest are ensemble learning- based classifiers that focus on training data sampling, feature selection, and feature extraction, respectively, to generate the associated ensemble. In our study, decision tree was used as the base learner, the ensemble size was set to 10, and majority voting was adopted to produce the final prediction for the four classifiers. The experimental results demonstrate the similar performance of the four methods. Among them, Random Forest, followed by Bagging, obtained the highest accuracy. This observation is in agreement with the results of Fernandez-Delgado et al., who tested 179 classifiers and found that Random Forest is the best classifier (Fernandez-Delgado et al., 2014).
Next, as a sparse representation-based classification method, SRC opened up a new era of study for remote sensing image classification and garnered substantial attention from researchers worldwide. SRC inherits the merits of sparse represen- tation, and it assigns a label for unknown samples according to the label of the sub-dictionary that produces the smallest reconstruction error.
Finally, LORSAL has emerged as a state-of-the-art classifier for remote sensing image classification in recent years. It serves as an efficient optimization algorithm used to learn the regressors in sparse multinomial logistic regression. LORSAL can generate label probability distributions for unknown data, allowing for high flexibility in conjunction with other disciplines involving probabilities.
Spatial phenology analysis
With the fitted and spike-removed time series, we first transformed those time series into HDF-EOS format and then calculated their phenological metrics using TIME- SAT. Within-annual variability for SOS, EOS, LOS, BOS, TOMS, POS, AOS, ROG, ROS, GSG, and NSG can be spatially analyzed in Fig. 10. The legends on the left-hand side of SOS, EOS, LOS, and TOMS represent the calendar day of the inter- annual season, whereas the legends for the other metrics represent the corresponding values.
Several observations can be made from Fig. 10 by visual inspection of the spatial-temporal variations in different phenological metrics. Forest area showed a clear earlier SOS, whereas Cultivated land exhibited an obviously later SOS, which was due to the farming activities. Wetland presented a much earlier SOS (i.e., 0‒38). Most regions presented a similar EOS, with small differences between the north and south parts of the Yangtze River Delta. The LOS of the north part was smaller than that of the south part. Different land cover types exhibit obviously different BOS, and Forest had a higher BOS compared to Cultivated land. Different regions presented different values of TOMS, and the regions along the east coast of the ocean have smaller TOMS, which was in accordance with the SOS and EOS. Most parts of the study area showed similar POS, except for water bodies and artificial surfaces, whereas the south part presented a much lower AOS. The south part exhibited higher ROG and ROS, illustrating the greater vegetation vigor for this region, i.e., the Forest. GSG and NSG reflected the regional spring greenness, which are directly linked with the biomass. The corresponding results demonstrated that the north part of the Yangtze River Delta has a greater capacity for producing biomass, which is in accordance with the classification map since Cultivated land is distributed in the north part.
Temporal phenology analysis
Figure 11 illustrates the changes in different phenological metrics (as a function of year) obtained for different land covers. Different vegetation has different phenology characteristics, which induce the inter-annual variation of phenology, whereas the between-annual phenology changes are mainly affected by rainfall, temperature, urbanization, deforestation, floods, fires, etc. (Verbesselt et al., 2010; Xue et al., 2014a). However, climate changes are the main driving forces for triggering the inter-annual phenology variability.
Atmospherical impacts on phenology
Based on the extracted seasonality parameters, we attempt to explain the phenology changes in the study area. For this purpose, two typical land cover types, cultivated land and forest, were considered. Moreover, the climate statistic data, i.e., rainfall and temperature from 2001 to 2014, were obtained to facilitate this analysis.
Figure 12 illustrates the evolution of yearly SOS and EOS for two typical land cover types against the yearly accumulated rainfall and the average temperature, where the left axis represents SOS or EOS and the right axis means rainfall or temperature. Variability in SOS and EOS reflected a temporal response to climate change, especially for rainfall. For cultivated land, the SOS fell on calendar day 180, which was much earlier than the average level from 2001 to 2014. This can be explained by the lowest rainfall (approximately 50 mm) in this drought year. A similar observation can be made for 2011. Another observation is that there was a significant correlation between EOS and rainfall occurring in 2011. In addition, we also found that SOS and EOS were closely related to temperature, with certifications in 2003 and 2011 for both SOS and EOS. As for forest, this correlation was not as obvious since forest is insensitive to rainfall and temperature changes.
Correlation analysis of GSG versus GPP
We conducted a correlation analysis of GSG versus GPP (Gross Primary Production) to explore the relationship between them. To this end, the average GPP from MODIS 8-day composite products MOD17A2 in the year of 2001, 2007, and 2014 were calculated. Then, these products were mosaicked, projected, resampled, and converted into GeoTIFF files. Third, some randomly selected samples were chosen to use for correlation analysis of GPP versus GSG. A relatively high correlation can be seen in Fig. 13. The correlation coefficients of GPP versus GSG were 0.73, 0.71, and 0.74, respectively, for 2001, 2007, and 2014, which indicates that GSG was highly related to photosynthesis of vegetation. This observation provides further potential use of GSG for the assessment of carbon flux.
Conclusions
In this paper, a novel methodology was presented to address the limitations of existing studies on time-series classification using only a single classifier and the inadequacy of previous work on phenology analysis in smaller spatial domains and for shorter temporal periods, with a special emphasis on the Yangtze River Delta.
We focused on analyzing the spatio-temporal changes in phenology for the Yangtze River Delta (with a new boundary enlarged by the state council in June 2016) based on Moderate Resolution Imaging Spectrometer (MODIS) NDVI time-series data collected from 2001 to 2015 (dating back 15 years). This was achieved by classifying the land cover types using ensemble learning-based classifiers, driven by extracting the phenological metrics using STL-decomposition for spike removal and a Savitzky-Golay filter for filtering, and implemented by analyzing the spatio-temporal phenology, therein considering classification maps and phenological metrics.
The experimental results indicated the following: 1) RF obtains more accurate classification maps; 2) different land cover types illustrate the various seasonalities; 3)the Yangtze River Delta has two obvious regions, i.e., the north and south parts, which may result from the different rainfall, temperature, and ecosystem conditions; 4) the phenology variation over time is not significant in the study area; and 5) the correlation between GSG and GPP is very high, indicating the potential use of GSG for assessing the carbon flux.
Although our experimental results are encouraging, further work on additional scenes and comparison methods should be conducted in the future. We are also planning on extending the current work from two perspectives: 1) we will focus on revealing the phenology drift of the Yangtze River Delta based on the current methodology, and 2) we will analyze the impacts of urbanization, population, and economy on the phenology.
Abercrombie S P, Friedl M A (2016). Improving the consistency of multitemporal land cover maps using a hidden Markov model. IEEE Trans Geosci Remote Sens, 54(2): 703–713
[2]
Anderson M C, Zolin C A, Sentelhas P C, Hain C R, Semmens K, Tugrul Yilmaz M, Gao F, Otkin J A, Tetrault R (2016). The evaporative stress index as an indicator of agricultural drought in Brazil: an assessment based on crop yield impacts. Remote Sens Environ, 174: 82–99
[3]
Breiman L (1996). Bagging predictors. Mach Learn, 24(2):123–140
[4]
Breiman L (2001) Random forests. Mach Learn, 45(1): 5–32
[5]
Chen J, Chen J, Liao A P, Cao X, Chen L J, Chen X H, Peng S, Han G, Zhang H W, He C Y, Wu H, Lu M (2014). Concepts and key techniques for 30 m global land cover mapping. Acta Geodaetica et Cartographica Sinica, 43(6): 551–557
[6]
Chen J, Jonsson P, Tamura M, Gu Z H, Matsushita B, Eklundh L (2004). A simple method for reconstructing a high-quality NDVI time-series data set based on the Savitzky-Golay filter. Remote Sens Environ, 91(3‒4): 332–344
[7]
Chen J, Rao Y H, Shen M G, Wang C, Zhou Y, Ma L, Tang Y H, Yang X (2016). A simple method for detecting phenological change from time series of vegetation index. IEEE Trans Geosci Remote Sens, 54(6): 3436–3449
[8]
Clauss K, Yan H M, Kuenzer C (2016). Mapping paddy rice in China in 2002, 2005, 2010 and 2014 with MODIS time series. Remote Sens, 8(5): 434
Cover T M, Hart P E (1967). Nearest neighbor pattern classification. IEEE Trans Inf Theory, 13(1): 21–27
[11]
Demir B, Bovolo F, Bruzzone L (2013). Updating land-cover maps by classification of image time series: a novel change-detection-driven transfer learning approach. IEEE Trans Geosci Remote Sens, 51(1): 300–312
[12]
Du P J, Xia J S, Zhang W, Tan K, Liu Y, Liu S C (2012). Multiple classifier system for remote sensing image classification: a review. Sensors (Basel), 12(4): 4764–4792
[13]
Eklundh L, Jönsson P (2015). Timesat 3.2 software mannual. Lund and Malmö University, Sweden
[14]
Fensholt R, Proud SR (2012). Evaluation of earth observation based global long term vegetation trends- Comparing GIMMS and MODIS global NDVI time series. Remote Sens Environ, 119: 131–147
[15]
Fernandez-Delgado M, Cernadas E, Barro S, Amorim D (2014). Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res, 15: 3133–3181
[16]
Foody G M (2004). Thematic map comparison: evaluating the statistical significance of differences in classification accuracy. Photogramm Eng Remote Sensing, 70(5): 627–633
[17]
Ghosh S, Mishra D R, Gitelson A A (2016). Long-term monitoring of biophysical characteristics of tidal wetlands in the northern Gulf of Mexico−A methodological approach using MODIS. Remote Sens Environ, 173: 39–58 doi:10.1016/j.rse.2015.11.015
[18]
Gómez C, White J C, Wulder M A (2016). Optical remotely sensed time series data for land cover classification: a review. ISPRS J Photogramm Remote Sens, 116: 55–72
[19]
Guan X D, Huang C, Liu G H, Meng X L, Liu Q S (2016). Mapping rice cropping systems in Vietnam using an NDVI-based time-series similarity measurement based on DTW distance. Remote Sens, 8(1): 19
[20]
Han G F, Xu J H (2013). Land surface phenology and land surface temperature changes along an urban-rural gradient in Yangtze River Delta, China. Environ Manage, 52(1): 234–249
[21]
Heremans S, Suykens J A K, Van Orshoven J (2016). The effect of imposing ‘fractional abundance constraints’ onto the multilayer perceptron for sub-pixel land cover classification. Int J Appl Earth Obs Geoinf, 44: 226–238
[22]
Hmimina G, Dufrêne E, Pontailler J Y, Delpierre N, Aubinet M, Caquet B, de Grandcourt A, Burban B, Flechard C, Granier A, Gross P, Heinesch B, Longdoz B, Moureaux C, Ourcival J M, Rambal S, Saint André L, Soudani K (2013). Evaluation of the potential of MODIS satellite data to predict vegetation phenology in different biomes: an investigation using ground-based NDVI measurements. Remote Sens Environ, 132: 145–158
[23]
Ho T K (1998). The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell, 20(8): 832–844
[24]
Huete A, Didan K, Miura T, Rodriguez E P, Gao X, Ferreira L G (2002). Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens Environ, 83(1‒2): 195–213
[25]
Jönsson P, Eklundh L (2002). Seasonality extraction by function fitting to time-series of satellite sensor data. IEEE Trans Geosci Remote Sens, 40(8): 1824–1832
[26]
Karalas K, Tsagkatakis G, Zervakis M, Tsakalides P (2016). Land classification using remotely sensed data: going multilabel. IEEE Trans Geosci Remote Sens, 54(6): 3548–3563
[27]
Li J, Bioucas-Dias J M, Plaza A (2011). Hyperspectral image segmentation using a new Bayesian approach with active learning. IEEE Trans Geosci Remote Sens, 49(10): 3947–3960
[28]
Li M M, Mao Z C, Song Y, Liu M X, Huang X (2015). Impacts of the decadal urbanization on thermally induced circulations in eastern China. J Appl Meteorol Climatol, 54(2): 259–282
[29]
Marston C G, Giraudoux P, Armitage R P, Danson F M, Reynolds S C, Wang Q, Qiu J M, Craig P S (2016). Vegetation phenology and habitat discrimination: impacts for E. multilocularis transmission host modelling. Remote Sens Environ, 176: 320–327
[30]
Nascimento J M P, Dias J M B (2005). Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans Geosci Remote Sens, 43(4): 898–910
[31]
Qader S H, Dash J, Atkinson P M, Rodriguez-Galiano V (2016). Classification of vegetation type in Iraq using satellite-based phenological parameters. IEEE J Sel Top Appl Earth Obs Remote Sens, 9(1): 414–424
[32]
Qiu B W, Feng M, Tang Z H (2016). A simple smoother based on continuous wavelet transform: comparative evaluation based on the fidelity, smoothness and efficiency in phenological estimation. Int J Appl Earth Obs Geoinf, 47: 91–101
[33]
Rodriguez J J, Kuncheva L I, Alonso C J (2006). Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell, 28(10): 1619–1630
[34]
Shao Y, Lunetta R S, Wheeler B, Iiames J S, Campbell J B (2016). An evaluation of time-series smoothing algorithms for land-cover classifications using MODIS-NDVI multi-temporal data. Remote Sens Environ, 174: 258–265
[35]
Shi J J, Huang J F (2015). Monitoring spatio-temporal distribution of rice planting area in the Yangtze River Delta region using MODIS images. Remote Sens, 7(7): 8883–8905
[36]
Verbesselt J, Hyndman R, Newnham G, Culvenor D (2010). Detecting trend and seasonal changes in satellite image time series. Remote Sens Environ, 114(1): 106–115
[37]
Verger A, Filella I, Baret F, Penuelas J (2016). Vegetation baseline phenology from kilometric global LAI satellite products. Remote Sens Environ, 178: 1–14
[38]
Wardlow B D, Egbert S L, Kastens J H (2007). Analysis of time-series MODIS 250 m vegetation index data for crop classification in the US Central Great Plains. Remote Sens Environ, 108(3): 290–310
[39]
Wei H Y, Heilman P, Qi J G, Nearing M A, Gu Z H, Zhang Y G (2012). Assessing phenological change in China from 1982 to 2006 using AVHRR imagery. Front Earth Sci, 6(3): 227–236
[40]
Wohlfart C, Liu G H, Huang C, Kuenzer C (2016). A river basin over the course of time: multi-temporal analyses of land surface dynamics in the Yellow River Basin (China) based on medium resolution remote sensing data. Remote Sens, 8(3): 186
[41]
Wright J, Yang A Y, Ganesh A, Sastry S S, Ma Y (2009). Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell, 31(2): 210–227
[42]
Xia J S, Dalla Mura M, Chanussot J, Du P J, He X Y (2015). Random subspace ensembles for hyperspectral image classification with extended morphological attribute profiles. IEEE Trans Geosci Remote Sens, 53(9): 4768–4786
[43]
Xia J S, Du P J, He X Y, Chanussot J (2014). Hyperspectral remote sensing image classification based on rotation forest. IEEE Geosci Remote Sens Lett, 11(1): 239–243
[44]
Xue Z H, Du P J, Feng L (2014a). Phenology-driven land cover classification and trend analysis based on long-term remote sensing image series. IEEE J Sel Top Appl Earth Obs Remote Sens, 7(4): 1142–1156
[45]
Xue Z H, Du P J, Su H J (2014b). Harmonic analysis for hyperspectral image classification integrated with PSO optimized SVM. IEEE J Sel Top Appl Earth Obs Remote Sens, 7(6): 2131–2146
[46]
Xue Z H, Li J, Cheng L, Du P J (2015). Spectral-spatial classification of hyperspectral data via morphological component analysis-based image separation. IEEE Trans Geosci Remote Sens, 53(1): 70–84
[47]
Zeng L L, Wardlow B D, Wang R, Shan J, Tadesse T, Hayes M J, Li D R (2016). A hybrid approach for detecting corn and soybean phenology with time-series MODIS data. Remote Sens Environ, 181: 237–250
[48]
Zhang B H, Zhang L, Xie D, Yin X L, Liu C J, Liu G (2016). Application of synthetic NDVI time series blended from Landsat and MODIS data for grassland biomass estimation. Remote Sensing, 8: 10
[49]
Zhang C, Ma Y (2012). Ensemble Machine Learning. Springer Verlag New York
[50]
Zhang X Y, Zhang Q Y (2016). Monitoring interannual variation in global crop yield using long-term AVHRR and MODIS observations. ISPRS J Photogramm Remote Sens, 114: 191–205
[51]
Zhao B, Yan Y, Guo H Q, He M M, Gu Y J, Li B (2009). Monitoring rapid vegetation succession in estuarine wetland using time series MODIS-based indicators: an application in the Yangtze River Delta area. Ecol Indic, 9(2): 346–356
[52]
Zhao J J, Wang Y Y, Zhang Z X, Zhang H Y, Guo X Y, Yu S, Du W L, Huang F (2016). The variations of land surface phenology in northeast China and its responses to climate change from 1982 to 2013. Remote Sens, 8(5): 400
[53]
Zhou D C, Zhao S Q, Zhang L X, Liu S G (2016). Remotely sensed assessment of urbanization effects on vegetation phenology in China’s 32 major cities. Remote Sens Environ, 176: 272–281
[54]
Zhu C M, Lu D S, Victoria D, Dutra L V (2016). Mapping fractional cropland distribution in Mato Grosso, Brazil using time series MODIS enhanced vegetation index and Landsat thematic mapper data. Remote Sens, 8: 22
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.