1. School of Earth Sciences and Engineering, Hohai University, Nanjing 210098, China
2. School of Earth Sciences and Resources, China University of Geosciences (Beijing), Beijing 100083, China
3. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
4. Key Laboratory of Virtual Geographic Environment (Ministry of Education), Nanjing Normal University, Nanjing 210023, China
5. Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080-3021, USA
hjsu@hhu.edu.cn (Hongjun SU)
sftian@cugb.edu.cn (Shufang TIAN)
Show less
History+
Received
Accepted
Published
2016-02-17
2016-06-26
2017-11-10
Issue Date
Revised Date
2016-10-12
PDF
(1149KB)
Abstract
This work presents a new urban land cover classification framework using the firefly algorithm (FA) optimized extreme learning machine (ELM). FA is adopted to optimize the regularization coefficientC and Gaussian kernel s for kernel ELM. Additionally, effectiveness of spectral features derived from an FA-based band selection algorithm is studied for the proposed classification task. Three sets of hyperspectral databases were recorded using different sensors, namely HYDICE, HyMap, and AVIRIS. Our study shows that the proposed method outperforms traditional classification algorithms such as SVM and reduces computational cost significantly.
Recently, extreme learning machine (ELM) has been successfully applied to a number of classification tasks (Huang et al., 2006, 2011; Chen et al., 2014; Lv et al., 2016). As a single-hidden layer feedforward neural network, the hidden node parameters of ELM are randomly generated and are independent of the training data. ELM tends to achieve better generalization performance by simultaneously minimizing the training error and the norm of the output weights. A large number of related studies proved its fast divergence and superior classification accuracy (Chen et al., 2014). Kernel-based ELM was also proposed in previous work (Liu et al., 2008; Huang et al., 2010; Pal et al., 2013; Chen et al., 2014). In particular, different variations of the ELM algorithm, such as composite kernels ELM and ensemble ELM, have been investigated for HSI classification (Pal et al., 2013; Chen et al., 2014; Samat et al., 2014; Tan et al., 2015). Many algorithms such as genetic algorithm (GA) (Zhen et al., 2000), differential evolution (Bazi et al., 2014) and particle swarm optimization (PSO) (Xue et al., 2014) have been used to optimize parameters of ELM, but they are time consuming (Lin et al., 2014).
As a new evolutionary optimization algorithm, FA (Yang, 2009) has been adopted in clustering, multi-objective scheduling, and band selection (Senthilnath et al., 2011; Su et al., 2016) due to its global searching ability in high-dimensional space. Compared with the PSO algorithm, FA is more computationally efficient when dealing with multimodal functions. PSO can be regarded as a special case of the FA from the theoretical perspective (Yang and He, 2013). It has been proven that it is possible to adjust the parameters of FA so that it can outperform both the random search and PSO algorithms (Su et al., 2016). Therefore, it might be a possible approach for ELM parameter optimization.
In this paper, an FA-optimized ELM framework for HSI urban land cover and land use classification is presented. Here, FA-based band selection is applied for HSI dimensional and redundancy reduction. FA is first adopted in hyperspectral fields to optimize the parameters of ELM with RBF kernel. With the selected bands and optimized parameters, more accurate and robust classification results can be obtained. The main contribution of this work can be summarized as a novel HSI urban land cover classification framework which combines dimension reduction, parameter optimization, and classification, where FA is utilized for both band selection and parameter optimization.
The remainder of this paper is organized as follows: Section 2 describes FA and ELM. Section 3 presents the proposed FA-inspired ELM classification framework. Section 4 shows the experimental results of the proposed method with three hyperspectral images. Section 5 makes several concluding remarks.
Related works
ELM
ELM is a single-hidden-layer feedforward neural network learning algorithm. Different from the traditional neural network, the hidden layer in ELM need not be tuned. Given a set of training data (), where, , the structure of ELM is composed of N-dimensional input layers and hidden layers of L nodes. The output function of ELM for generalized SLFNs is
Eq. (1) can be written as shown in Eq. (2), whereis the vector of output weights corresponding to the connections between the hidden and output layer's nodesL. H is the hidden-layer output matrix
The ith column of H represents the output vector which is the ith hidden node according to the input vector of {x1,x2,…,xn}. h(x) is a feature mapping function which projects the samples from N-dimensional input space to the L-dimensional hidden-layer space.
In order to train SLFN, the following goals is achieved by:
ELM tends to result in not only the smallest training error but also the smallest norm of output weights. According to Bartlett’s theory, to achieve the smallest training error for feedforward neural networks, the networks should have the small norms of weights. The minimal norm least square solution is
whereis the Moore–Penrose generalized inverse of matrix H.
With a multi-output node, the constraint optimization based ELM can be represented as:
According to the KKT theorem, training ELM is equivalent to solving the following dual optimization problem:
The partial derivatives of L with respect to model parameters can be represented as
Incorporating the partial derivatives leads to the following optimization problem,
where
From Eq. (8), we can conclude that
Or equivalently
The output function of the ELM classifier is
Or equivalently
If a feature mapping function h(x) is unknown, Mercer’s conditions can be used on ELM, hence, we can define a kernel matrix for ELM as follows:
Finally, the output function of the ELM classifier can be written as
FA
Fireflies have different flashing behaviors, which are used for communication and attracting potential prey. There are two necessary elements, namely brightnessI and attractiveness b. The I at a particular distance r follows the inverse square law, which means that I decreases as r increases. In the FA, the brightness of a firefly is expressed in terms of its current position: if it is brighter, its position is preferred; this also means the value of the objective function is larger. The less bright ones will move towards the brighter ones. In the case that the brightness of fireflies has the same value, they will move randomly.
For simplicity, the following hypotheses are made in describing the FA: 1) all fireflies are unisex so that one firefly will be attracted to others regardless of their sex; 2) the attractiveness of a firefly is proportional to its brightness; 3) the brightness is proportional to the objective Eq. (9). During the process of fly movement,I and b are updated repeatedly, and randomly distributed points are moved gradually towards the extreme points. After a certain number of iterations, the less desired points are eliminated and the best positional points are finalized.
The brightness of a firefly varies with the value of an objective function, which can be defined as
where I0 is the maximum brightness when r = 0; it is related to the value of the objective function, and a larger value means brighter. Here, g is light absorption coefficient, and rij is the distance between the ith and jth fireflies.
The attractiveness of a firefly is proportional to its light brightness observed by adjacent fireflies; it can be expressed as
where b0 is the attractiveness when the distance between two fireflies is zero. In this research, we simply set I(r) =b(r). The equation that updates the jth firefly’s location based on the ith firefly’s attraction can be described as
wherexi and xj are the initial position of the ith and jth firefly, respectively, a is a constant within [0,1], and rand is a random number within [0,1]. Form fireflies, the new location of xj can be determined after considering all other fireflies:
FA-Optimized ELM
FA Optimized ELM
In order to improve the performance of ELM for HSI classification, FA is adopted to optimize the regularization coefficientC and Gaussian kernel s of kernel ELM. The steps can be described as follows.
1) Parameters initialization: maximum iterations t=100, step size a = 0.2, light absorbance g = 1, population size m = 10; maximum attractiveness b0=1.
2) Brightness computation with Eq. (17). The distance r between two fireflies is the index distance between two different parameter combinations.
3) Relative brightness b computation with Eq. (18). The direction of the firefly movement according to the value of relative brightnessb.
4) Movement status estimation using Eqs. (19) and (20).
5) Firefly brightness re-estimation given an updated location.
6) Repeat steps 2– to 5 until reaching the maximum number of iterations. The final selected parameters are the global optimum solutions.
Proposed FA-inspired classification framework
The proposed FA-inspired classification framework with FA-based band selection and FA optimized ELM is shown in Fig. 1. The HSI classification framework can be divided into two parts: band selection and parameter optimization. Here, FA is used to select the optimal informative bands based on MEAC and JM measures; and then ELM is utilized for HSI classification with optimized parameters combination (C,s). With the selected bands and optimal parameters, it can lead to rapid and accurate HSI classification.
Experiments and analysis
Parameter setting
The range of the regularization coefficient for ELM is within [10‒5, 105], and for Gaussian kernel , is within (0, 1]. Meanwhile, the range of the regularization coefficient in SVM is [10‒5, 105], and for Gaussian kernel is [10‒5, 105]. Moreover, the parameters of each bionic intelligent algorithm such as GA, PSO and FA, are chosen based on the expert experience; also for fair comparison, the maximum iterations (set to 100) and population size (set to 10) are set as the same value, respectively. The other parameters of GA are set as: crossover probability G_gap= 0.9, mutation probability P_mutation=0.07; the other parameters of PSO are set as: inertia weight W=1, accelerating factor C1=1.7, C2=1.5; the other parameters of FA are set as: light absorbance=1, maximum attractiveness=1, Step size=0.2. The details are shown in Table 1.
Comparing different methods
SVM is a kind of machine learning method with strong generalization ability and fast training speed, which comes from the theory of statistical learning and VC dimension. PSO is a kind of bionic intelligent optimization algorithm, which is based on the behavior of birds feeding, and shows its superiority in solving practical problems. GA is another optimization algorithm for searching parameters. It finds the optimal solution by imitating natural selection and genetic mechanism. After selection, crossover and mutation of the basic operations of GA, the "chromosome" will converge to the most-adapted one, which is the optimal solution.
Experimental data
The HYDICE subimage scene in Fig. 2(a) with 304 × 301 pixels over the Washington DC Mall area was used in the first experiment. After bad band removal, 191 bands were left in this experiment. There are six classes, namely roof, tree, grass, water, road, and trail. These six class centers are used for band selection. The available training and test samples are shown in Table 2.
The second dataset is a flightline over the Purdue University West Lafayette campus. The hyperspectral data were collected on September 30, 1999 with the airborne Hyperspectral Mapper (HyMap) system, providing image data in 128 spectral bands in the visible and infrared regions (0.4–2.4mm). In this experiment, except for the atmospheric water absorption bands, the 126 bands are used. The system was flown at an altitude such that the pixel size is about 3.5 m. An image of the scene is shown in Fig. 2(b). The information of training and testing samples are listed in Table 2.
The third data is a small subscene of Salinas image. It comprises 86×83 pixels located within the same scene at [samples, lines] = [591-676, 158-240] and includes six classes. An image of the scene is shown in Fig. 2(c). The information of samples is listed in Table 2.
Results and analysis
HYDICE data is used in the first experiment. For more in-depth comparison of the performance of the different algorithms, the number of selected bands are changed from 5 to 15, and the optimized ELM and SVM classifiers are used for classification. The results are shown in Fig. 3(a), we can see that FA-ELM results in a higher accuracy in comparison with the GA-ELM and PSO-ELM using the same number of bands for classification. After parameter optimization, the classification accuracy of FA-ELM can reach to 96.81%. With SVM classifier, FA-SVM can also achieve better results compared with GA-SVM and PSO-SVM, which proved the better performance of FA. The accuracy of ELM was slightly better than that of SVM according to three different algorithms. As shown in Fig. 4(a), with FA selected bands as initials, the classification results are better than that of all bands for optimized ELM classifier, which means that the FA band selection is also an effective way to improve classification performance. FA's superior performance may be due to its optimal search pattern in comparison with PSO and GA.
In the second experiment with HyMap data, the classification results are shown in Fig. 3(b), this figure shows that the highest accuracy of all, 95.79%, belongs to the FA-ELM approach. All other algorithms performed quite similarly, while GA yielded the worst results. The performance of FA-ELM, PSO-ELM, and GA-ELM are better than those of FA-SVM, PSO-SVM, and GA-SVM. As shown in Fig. 3(b), with the selected band number increased, the overall classification accuracy raised stably. The classification accuracy with FA optimization can improve by 7%, which is the common result of feature extraction and optimization.
For the third experiment with AVIRIS data, a cross validation method is used for classification. As shown in Fig. 3(c), the performance of the ELM method is better than that of SVM in most cases. The best accuracy of all belongs to the FA-ELM approach that achieves 99.7% which means FA can also obtain the optimal results for AVIRIS data. The classification accuracy of the remaining two optimization algorithms are lower than those of FA. Results show that the accuracy after the FA optimization is much higher than the non-optimized baseline. Once again, this experiment proved that FA can achieve the better learning performance due to its outstanding searching ability and convergence speed.
Computing time
In order to further illustrate the performance of ELM and SVM classifiers, the running time of different algorithms with three hyperspectral datasets are listed in Table 3. We can see that the running time of ELM is much lower than that of SVM, which means ELM is a more efficient classifier as compared with SVM. For SVM, the main reason behind the slow training speed may be the time spent during the traditional standard quadratic optimization process, and the main computational cost comes from computation of the corresponding Lagrange multipliera. Compared with SVM, ELM computes based on a simple method. More importantly, when the scale of data is large or its dimensionality is high, instead of, ELM can find a solution based on Eq. (14), where is used. L represents the number of hidden nodes, which is far less than the number of training samplesL<<N; hence, and this can reduce the computational cost dramatically.
From Table 3, the learning speed of ELM is higher than that of SVM. On average ELM runs 6 times faster than SVM. Results show that with the increase in the number of selected bands the SVM’s time consumption rate grows much faster than that of ELM. Since network weights of ELM do not require re-adjustment, it runs faster than SVM. Therefore, for processing large scale data it is advantageous to use ELM rather than SVM.
Conclusions
In this paper, a FA-inspired framework with band selection and optimized ELM for HSI urban classification is proposed. The main contributions are: 1) a novel HSI urban classification framework, which combines dimensionality reduction, parameter optimization, and classification; 2) FA is utilized for both band selection and parameters optimization. Three different HSI dataset experiments certify the significant improvements in classification accuracy obtained by the proposed method.
Bao Y, Tian Q, Chen M (2015a). A weighted algorithm based on normalized mutual information for estimating the chlorophyll-a concentration in inland waters using geostationary ocean color imager(GOCI) data. Remote Sens, 7(9): 11731–11752
[2]
Bao Y, Tian Q, Chen M, Lin H (2015b). An automatic extraction method for individual tree crowns based on self-adaptive mutual information and tile computing. Int J Digit Earth, 8(6): 495–516
[3]
Bazi Y, Alajlan N, Melgani F, AlHichri H, Malek S, Yager R R (2014). Differential evolution extreme learning machine for the classification of hyperspectral images. IEEE Geosci Remote Sens Lett, 11(6): 1066–1070
[4]
Bioucas-Dias J, Plaza A, Camps-Valls G, Scheunders P, Nasrabadi N, Chanussot J (2013). Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci Remote Sens Mag, 1(2): 6–36
[5]
Camps-Valls G, Tuia D, Bruzzone L, Benediktsson J A (2014). Advances in hyperspectral image classification: earth monitoring withstatistical learning methods. IEEE Signal Process Mag, 31(1): 45–54
[6]
Chang C I (2003). Hyperspectral Imaging: Techniques for Spectral Detection and Classification. New York: Kluwer Academic/Plenum Publishers, 13–15
[7]
Chen C, Li W, Su H, Liu K (2014). Spectral-spatial classification of hyperspectral image based on kernel extreme learning machine. Remote Sens, 6(6): 5795–5814
[8]
Cheng G, Zhu F, Xiang S, Wang Y, Pan C (2016). Semisupervised hyperspectral image classification via discriminant analysis and robust regression. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(2): 595–608 doi:10.1109/JSTARS.2015.2471176
[9]
Cortes C, Vapnik V (1995). Support vector networks. Mach Learn, 20(3): 273–297
[10]
de Morsier F, Borgeaud M, Gass V, Thiran J-P, Tuia, D(2016). Kernel low-rank and sparse graph for unsupervised and semi-supervised classification of hyperspectral images. IEEE Trans Geosci Remote Sens, 54(6):1–11
[11]
Hu F, Xia G, Wang Z, Huang X, Zhang L, Sun H (2015). Unsupervised feature learning via spectral clustering of multi-dimensional patches for remotely sensed scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(5): 2015–2030
[12]
Huang G B, Ding X, Zhou H (2010). Optimization method based extreme learning machine for classification. Neurocomputing, 74(1‒3): 155–163
[13]
Huang G B, Wang D, Lan Y (2011). Extreme learning machines: a survey. Int J Mach Learn & Cyber, 2(2): 107–122
[14]
Huang G B, Zhu Q Y, Siew C K (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1‒3): 489–501
[15]
Li W, Du Q, Zhang F, Hu W (2015). Collaborative representation based nearest neighbor classifier for hyperspectral imagery. IEEE Geosci Remote Sens Lett, 12(2): 389–393
[16]
Lin J, Huang B, Chen M, Huang Z (2014). Modeling urban vertical growth using cellular automata-Guangzhou as a case study. Appl Geogr, 53: 172–186
[17]
Liu Q, He Q, Shi Z (2008). Extreme support vector machine classifier. Lect Notes Comput Sci, 5012: 222–233
[18]
Lv Q, Niu X, Dou Y, Xu J, Lei Y(2016). Classification of hyperspectral remote sensing image using hierarchical local-receptive-field-based extreme learning machine. IEEE Geoscience and Remote Sensing Letters, 13(3):1–5
[19]
Melgani F, Bruzzone L (2004). Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Rem Sens, 42(8): 1778–1790
[20]
Pal M, Maxwell A E, Warner T A (2013). Kernel-based extreme learning machine for remote-sensing image classification. Remote Sens Lett, 4(9): 853–862
[21]
Ratle F, Camps-Valls G, Weston J (2010). Semisupervised neural networks for efficient hyperspectral image classification. IEEE Trans Geosci Rem Sens, 48(5): 2271–2282
[22]
Samat A, Du P, Liu S, Li J, Cheng L (2014). E2LMs: ensemble extreme learning machines for hyperspectral image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(4): 1060–1069
[23]
Senthilnath J, Omkar S N, Mani V (2011). Clustering using firefly algorithm: performance study. Swarm Evol Comput, 1(3): 164–171
[24]
Su H, Yong B, Du Q (2016). Hyperspectral band selection using improved firefly algorithm. IEEE Geosci Remote Sens Lett, 13(1): 68–72
[25]
Tan K, Zhou S, Du Q (2015). Semi-supervised discriminant analysis for hyperspectral imagery with block-sparse graph. IEEE Geosci Remote Sens Lett, 12(8): 1765–1769
[26]
Xue Z, Du P, Su H (2014). Harmonic analysis for hyperspectral image classification integrated with PSO optimized SVM. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(6): 2131–2146
[27]
Yang X, He X (2013). Firefly algorithm: recent advances and applications. Int J Swarm Intelligence, 1(1): 36–50 doi:10.1504/IJSI.2013.055801
[28]
Yang X S (2009). Firefly Algorithms for Multimodal Optimization. Stochastic Algorithms: Foundations and Applications. Berlin Heidelberg: Springer-Verlag, 169–178
[29]
Zhang L, Zhang L, Tao D, Huang X (2012). On combining multiple features for hyperspectral remote sensing image classification. IEEE Trans Geosci Rem Sens, 50(3): 879–893
[30]
Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015). Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognit, 48(10): 3102–3112
[31]
Zhen Z, Zhen H, Li P (2000). The parameters selection of genetic algorithms in texture classification. Acta Geodaetica et Cartographica Sinica, 29(1): 36–39
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.