A Grad-CAM and capsule network hybrid method for remote sensing image scene classification
Zhan HE, Chunju ZHANG, Shu WANG, Jianwei HUANG, Xiaoyun ZHENG, Weijie JIANG, Jiachen BO, Yucheng YANG
A Grad-CAM and capsule network hybrid method for remote sensing image scene classification
Remote sensing image scene classification and remote sensing technology applications are hot research topics. Although CNN-based models have reached high average accuracy, some classes are still misclassified, such as “freeway,” “spare residential,” and “commercial_area.” These classes contain typical decisive features, spatial-relation features, and mixed decisive and spatial-relation features, which limit high-quality image scene classification. To address this issue, this paper proposes a Grad-CAM and capsule network hybrid method for image scene classification. The Grad-CAM and capsule network structures have the potential to recognize decisive features and spatial-relation features, respectively. By using a pre-trained model, hybrid structure, and structure adjustment, the proposed model can recognize both decisive and spatial-relation features. A group of experiments is designed on three popular data sets with increasing classification difficulties. In the most advanced experiment, 92.67% average accuracy is achieved. Specifically, 83%, 75%, and 86% accuracies are obtained in the classes of “church,” “palace,” and “commercial_area,” respectively. This research demonstrates that the hybrid structure can effectively improve performance by considering both decisive and spatial-relation features. Therefore, Grad-CAM-CapsNet is a promising and powerful structure for image scene classification.
image scene classification / CNN / Grad-CAM / CapsNet / DenseNet
[1] |
AbaiZ, Rajmalwar N (2019). DenseNet models for tiny imagenet classification. arXiv preprint arXiv: 1904.10429
|
[2] |
Ahmed A, Jalal A, Kim K (2020). A novel statistical method for scene classification based on multi-object categorization and logistic regression.Sensors (Basel), 20(14): 3871
CrossRef
Google scholar
|
[3] |
Bai S (2016). Growing random forest on deep convolutional neural networks for scene categorization.Expert Systems with Applications, 71: 279–287
CrossRef
Google scholar
|
[4] |
CastelluccioM, PoggiG, SansoneC (2015). Land use classification in remote sensing images by convolutional neural networks. arXiv preprint arXiv: 1508.00092
|
[5] |
Chaib S, Liu H, Gu Y, Yao H (2017). Deep feature fusion for VHR remote sensing scene classification.IEEE Trans Geosci Remote Sens, 55(8): 4775–4784
CrossRef
Google scholar
|
[6] |
Chen J, Wang C, Ma Z, Chen J, He D, Ackland S (2018). Remote sensing scene classification based on convolutional neural networks pre-trained using attention-guided sparse filters.Remote Sens (Basel), 10(2): 290
CrossRef
Google scholar
|
[7] |
Cheng G, Han J, Lu X (2017a). Remote sensing image scene classification: benchmark and state of the art.Proc IEEE, 105(10): 1865–1883
CrossRef
Google scholar
|
[8] |
Cheng G, Li Z, Yao X, Guo L, Wei Z (2017b). Remote sensing image scene classification using bag of convolutional features.IEEE Geosci Remote Sens Lett, 14(10): 1735–1739
CrossRef
Google scholar
|
[9] |
Cheng G, Yang C, Yao X, Guo L, Han J (2018). When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs.IEEE Trans Geosci Remote Sens, 56(5): 2811–2821
CrossRef
Google scholar
|
[10] |
Cheng G, Zhou P, Han J (2016). Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images.IEEE Trans Geosci Remote Sens, 54(12): 7405–7415
CrossRef
Google scholar
|
[11] |
FanR, WangL, FengR (2019). Attention based residual network for high-resolution remote sensing imagery scene classification. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 1346–1349
|
[12] |
Gan J, Li Q, Zhang Z, Wang J (2016). Two-level feature representation for aerial scene classification.IEEE Geosci Remote Sens Lett, 13(11): 1626–1630
CrossRef
Google scholar
|
[13] |
Gong C, Han J, Lu X (2017). Remote sensing image scene classification: benchmark and state of the art.In: Proceedings of the IEEE, 105(10): 1865–1883
CrossRef
Google scholar
|
[14] |
HouQ, ZhouD, FengJ (2021). Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13713–13722
|
[15] |
KingmaD P, BaJ (2014). Adam: a method for stochastic optimization. arXiv preprint arXiv: 1412.6980
|
[16] |
Knorn J, Rabe A, Radeloff V C, Kuemmerle T, Kozak J, Hostert P (2009). Land cover mapping of large areas using chain classification of neighboring Landsat satellite images.Remote Sens Environ, 113(5): 957–964
CrossRef
Google scholar
|
[17] |
Lei R, Zhang C, Liu W, Zhang L, Zhang X, Yang Y, Huang J, Li Z, Zhou Z (2021). Hyperspectral remote sensing image classification using deep convolutional capsule network.IEEE J Sel Top Appl Earth Obs Remote Sens, 14: 8297–8315
CrossRef
Google scholar
|
[18] |
Lei R, Zhang C, Zhang X, Huang J, Li Z, Liu W, Cui H (2022). Multiscale feature aggregation capsule neural network for hyperspectral remote sensing image classification.Remote Sens (Basel), 14(7): 1652
CrossRef
Google scholar
|
[19] |
Li J, Lin D, Wang Y, Xu G, Zhang Y, Ding C, Zhou Y (2020). Deep discriminative representation learning with attention map for scene classification.Remote Sens (Basel), 12(9): 1366
CrossRef
Google scholar
|
[20] |
LiuY, ChengM M, HuX (2017). Richer convolutional features for edge detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5872–5881
|
[21] |
Liu Y, Huang C (2018). Scene classification via triplet networks.IEEE J Sel Top Appl Earth Obs Remote Sens, 11(1): 220–237
CrossRef
Google scholar
|
[22] |
Marmanis D, Datcu M, Esch T, Stilla U (2016). Deep learning earth observation classification using imagenet pretrained networks.IEEE Geosci Remote Sens Lett, 13(1): 105–109
CrossRef
Google scholar
|
[23] |
Mei X, Pan E, Ma Y, Dai X, Huang J, Fan F, Du Q, Zheng H, Ma J (2019). Spectral-spatial attention networks for hyperspectral image classification.Remote Sens (Basel), 11(8): 963
CrossRef
Google scholar
|
[24] |
Pan Z, Xu J, Guo Y, Hu Y, Wang G (2020). Deep learning segmentation and classification for urban village using a worldview satellite image based on U-Net.Remote Sens (Basel), 12(10): 1574
CrossRef
Google scholar
|
[25] |
Pires de Lima R, Marfurt K (2019). Convolutional neural network for remote-sensing scene classification: transfer learning analysis.Remote Sens (Basel), 12(1): 86
CrossRef
Google scholar
|
[26] |
Raiyani K, Gonçalves T, Rato L, Salgueiro P, Marques da Silva J R (2021). Sentinel-2 image scene classification: a comparison between Sen2Cor and a machine learning approach.Remote Sens (Basel), 13(2): 300
CrossRef
Google scholar
|
[27] |
Raza A, Huo H, Sirajuddin S, Fang T (2020). Diverse capsules network combining multiconvolutional layers for remote sensing image scene classification.IEEE J Sel Top Appl Earth Obs Remote Sens, 13: 5297–5313
CrossRef
Google scholar
|
[28] |
SabourS, FrosstN, HintonG E (2017). Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), 3859–3869
|
[29] |
Sheng G, Yang W, Xu T, Sun H (2012). High-resolution satellite scene classification using a sparse coding based multiple feature combination.Int J Remote Sens, 33(8): 2395–2412
CrossRef
Google scholar
|
[30] |
Sun X, Zhu Q, Qin Q (2021). A multi-level convolution pyramid semantic fusion framework for high-resolution remote sensing image scene classification and annotation.IEEE Access, 9: 18195–18208
CrossRef
Google scholar
|
[31] |
Szegedy C, Ioffe S, Vanhoucke V, Alemi A A (2017). Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI'17). AAAI Press, 4278–4284
|
[32] |
Tian T, Liu X, Wang L (2019a). Remote sensing scene classification based on res-capsnet. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium.IEEE, 2019: 525–528
|
[33] |
Tian X, An J, Mu G (2019b). Power System Transient Stability Assessment Method Based on CapsNet. In: 2019 IEEE Innovative Smart Grid Technologies-Asia (ISGT Asia).IEEE, 2019: 1159–1164
|
[34] |
Tong W, Chen W, Han W, Li X, Wang L (2020). Channel-attention-based DenseNet network for remote sensing image scene classification.IEEE J Sel Top Appl Earth Obs Remote Sens, 13: 4121–4132
CrossRef
Google scholar
|
[35] |
Vo T, Tran D, Ma W (2015). Tensor decomposition and application in image classification with histogram of oriented gradients.Neurocomputing, 165: 38–45
CrossRef
Google scholar
|
[36] |
Wang Y, Zhang J, Kan M (2020). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation.In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 12275–12284
|
[37] |
Weng Q, Mao Z, Lin J, Guo W (2017). Land-use classification via extreme learning classifier based on deep convolutional features.IEEE Geosci Remote Sens Lett, 14(5): 704–708
CrossRef
Google scholar
|
[38] |
Xia G S, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L, Lu X (2017). AID: A benchmark data set for performance evaluation of aerial scene classification.IEEE Trans Geosci Remote Sens, 55(7): 3965–3981
CrossRef
Google scholar
|
[39] |
Yang Y, Newsam S (2010). Bag-of-visual-words and spatial extensions for land-use classification.In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2010: 270–279
|
[40] |
Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2021). Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation.Int J Comput Vis, 129(11): 3051–3068
CrossRef
Google scholar
|
[41] |
Yu Y, Liu F (2018a). A two-stream deep fusion framework for high-resolution aerial scene classification.Comput Intell Neurosci, 2018: 8639367
CrossRef
Google scholar
|
[42] |
Yu Y, Liu F (2018b). Dense connectivity based two-stream deep feature fusion framework for aerial scene classification.Remote Sens (Basel), 10(7): 1158
CrossRef
Google scholar
|
[43] |
Zhang W, Tang P, Zhao L (2019). Remote sensing image scene classification using CNN-CapsNet.Remote Sens (Basel), 11(5): 494
CrossRef
Google scholar
|
[44] |
Zhang X, Wang G, Zhao S G (2022). CapsNet-COVID19: Lung CT image classification method based on CapsNet model.Math Biosci Eng, 19(5): 5055–5074
CrossRef
Google scholar
|
[45] |
Zhao B, Zhong Y, Zhang L, Huang B (2016). The Fisher kernel coding framework for high spatial resolution scene classification.Remote Sens (Basel), 8(2): 157
CrossRef
Google scholar
|
[46] |
Zhao D, Chen Y, Lv L (2017). Deep reinforcement learning with visual attention for vehicle classification.IEEE Trans Cogn Dev Syst, 9(4): 356–367
CrossRef
Google scholar
|
[47] |
Zhao X, Zhang J, Tian J, Zhuo L, Zhang J (2020). Residual dense network based on channel-spatial attention for the scene classification of a high-resolution remote sensing image.Remote Sens (Basel), 12(11): 1887
CrossRef
Google scholar
|
[48] |
Zhou B, Khosla A, Lapedriza A (2016). Learning deep features for discriminative localization.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2921–2929
|
/
〈 | 〉 |