A Grad-CAM and capsule network hybrid method for remote sensing image scene classification

Zhan HE; Chunju ZHANG; Shu WANG; Jianwei HUANG; Xiaoyun ZHENG; Weijie JIANG; Jiachen BO; Yucheng YANG

doi:10.1007/s11707-022-1079-x

PDF(6041 KB)

Front. Earth Sci. ›› 2024, Vol. 18 ›› Issue (3) : 538-553. DOI: 10.1007/s11707-022-1079-x

RESEARCH ARTICLE

A Grad-CAM and capsule network hybrid method for remote sensing image scene classification

Author information +

History +

Abstract

Remote sensing image scene classification and remote sensing technology applications are hot research topics. Although CNN-based models have reached high average accuracy, some classes are still misclassified, such as “freeway,” “spare residential,” and “commercial_area.” These classes contain typical decisive features, spatial-relation features, and mixed decisive and spatial-relation features, which limit high-quality image scene classification. To address this issue, this paper proposes a Grad-CAM and capsule network hybrid method for image scene classification. The Grad-CAM and capsule network structures have the potential to recognize decisive features and spatial-relation features, respectively. By using a pre-trained model, hybrid structure, and structure adjustment, the proposed model can recognize both decisive and spatial-relation features. A group of experiments is designed on three popular data sets with increasing classification difficulties. In the most advanced experiment, 92.67% average accuracy is achieved. Specifically, 83%, 75%, and 86% accuracies are obtained in the classes of “church,” “palace,” and “commercial_area,” respectively. This research demonstrates that the hybrid structure can effectively improve performance by considering both decisive and spatial-relation features. Therefore, Grad-CAM-CapsNet is a promising and powerful structure for image scene classification.

Graphical abstract

Keywords

image scene classification / CNN / Grad-CAM / CapsNet / DenseNet

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Zhan HE, Chunju ZHANG, Shu WANG, Jianwei HUANG, Xiaoyun ZHENG, Weijie JIANG, Jiachen BO, Yucheng YANG. A Grad-CAM and capsule network hybrid method for remote sensing image scene classification. Front. Earth Sci., 2024, 18(3): 538‒553 https://doi.org/10.1007/s11707-022-1079-x

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	AbaiZ, Rajmalwar N (2019). DenseNet models for tiny imagenet classification. arXiv preprint arXiv: 1904.10429

[2]	Ahmed A, Jalal A, Kim K (2020). A novel statistical method for scene classification based on multi-object categorization and logistic regression.Sensors (Basel), 20(14): 3871 CrossRef Google scholar

[3]	Bai S (2016). Growing random forest on deep convolutional neural networks for scene categorization.Expert Systems with Applications, 71: 279–287 CrossRef Google scholar

[4]	CastelluccioM, PoggiG, SansoneC (2015). Land use classification in remote sensing images by convolutional neural networks. arXiv preprint arXiv: 1508.00092

[5]	Chaib S, Liu H, Gu Y, Yao H (2017). Deep feature fusion for VHR remote sensing scene classification.IEEE Trans Geosci Remote Sens, 55(8): 4775–4784 CrossRef Google scholar

[6]	Chen J, Wang C, Ma Z, Chen J, He D, Ackland S (2018). Remote sensing scene classification based on convolutional neural networks pre-trained using attention-guided sparse filters.Remote Sens (Basel), 10(2): 290 CrossRef Google scholar

[7]	Cheng G, Han J, Lu X (2017a). Remote sensing image scene classification: benchmark and state of the art.Proc IEEE, 105(10): 1865–1883 CrossRef Google scholar

[8]	Cheng G, Li Z, Yao X, Guo L, Wei Z (2017b). Remote sensing image scene classification using bag of convolutional features.IEEE Geosci Remote Sens Lett, 14(10): 1735–1739 CrossRef Google scholar

[9]	Cheng G, Yang C, Yao X, Guo L, Han J (2018). When deep learning meets metric learning: remote sensing image scene classification via learning discriminative CNNs.IEEE Trans Geosci Remote Sens, 56(5): 2811–2821 CrossRef Google scholar

[10]	Cheng G, Zhou P, Han J (2016). Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images.IEEE Trans Geosci Remote Sens, 54(12): 7405–7415 CrossRef Google scholar

[11]	FanR, WangL, FengR (2019). Attention based residual network for high-resolution remote sensing imagery scene classification. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 1346–1349

[12]	Gan J, Li Q, Zhang Z, Wang J (2016). Two-level feature representation for aerial scene classification.IEEE Geosci Remote Sens Lett, 13(11): 1626–1630 CrossRef Google scholar

[13]	Gong C, Han J, Lu X (2017). Remote sensing image scene classification: benchmark and state of the art.In: Proceedings of the IEEE, 105(10): 1865–1883 CrossRef Google scholar

[14]	HouQ, ZhouD, FengJ (2021). Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13713–13722

[15]	KingmaD P, BaJ (2014). Adam: a method for stochastic optimization. arXiv preprint arXiv: 1412.6980

[16]	Knorn J, Rabe A, Radeloff V C, Kuemmerle T, Kozak J, Hostert P (2009). Land cover mapping of large areas using chain classification of neighboring Landsat satellite images.Remote Sens Environ, 113(5): 957–964 CrossRef Google scholar

[17]	Lei R, Zhang C, Liu W, Zhang L, Zhang X, Yang Y, Huang J, Li Z, Zhou Z (2021). Hyperspectral remote sensing image classification using deep convolutional capsule network.IEEE J Sel Top Appl Earth Obs Remote Sens, 14: 8297–8315 CrossRef Google scholar

[18]	Lei R, Zhang C, Zhang X, Huang J, Li Z, Liu W, Cui H (2022). Multiscale feature aggregation capsule neural network for hyperspectral remote sensing image classification.Remote Sens (Basel), 14(7): 1652 CrossRef Google scholar

[19]	Li J, Lin D, Wang Y, Xu G, Zhang Y, Ding C, Zhou Y (2020). Deep discriminative representation learning with attention map for scene classification.Remote Sens (Basel), 12(9): 1366 CrossRef Google scholar

[20]	LiuY, ChengM M, HuX (2017). Richer convolutional features for edge detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5872–5881

[21]	Liu Y, Huang C (2018). Scene classification via triplet networks.IEEE J Sel Top Appl Earth Obs Remote Sens, 11(1): 220–237 CrossRef Google scholar

[22]	Marmanis D, Datcu M, Esch T, Stilla U (2016). Deep learning earth observation classification using imagenet pretrained networks.IEEE Geosci Remote Sens Lett, 13(1): 105–109 CrossRef Google scholar

[23]	Mei X, Pan E, Ma Y, Dai X, Huang J, Fan F, Du Q, Zheng H, Ma J (2019). Spectral-spatial attention networks for hyperspectral image classification.Remote Sens (Basel), 11(8): 963 CrossRef Google scholar

[24]	Pan Z, Xu J, Guo Y, Hu Y, Wang G (2020). Deep learning segmentation and classification for urban village using a worldview satellite image based on U-Net.Remote Sens (Basel), 12(10): 1574 CrossRef Google scholar

[25]	Pires de Lima R, Marfurt K (2019). Convolutional neural network for remote-sensing scene classification: transfer learning analysis.Remote Sens (Basel), 12(1): 86 CrossRef Google scholar

[26]	Raiyani K, Gonçalves T, Rato L, Salgueiro P, Marques da Silva J R (2021). Sentinel-2 image scene classification: a comparison between Sen2Cor and a machine learning approach.Remote Sens (Basel), 13(2): 300 CrossRef Google scholar

[27]	Raza A, Huo H, Sirajuddin S, Fang T (2020). Diverse capsules network combining multiconvolutional layers for remote sensing image scene classification.IEEE J Sel Top Appl Earth Obs Remote Sens, 13: 5297–5313 CrossRef Google scholar

[28]	SabourS, FrosstN, HintonG E (2017). Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), 3859–3869

[29]	Sheng G, Yang W, Xu T, Sun H (2012). High-resolution satellite scene classification using a sparse coding based multiple feature combination.Int J Remote Sens, 33(8): 2395–2412 CrossRef Google scholar

[30]	Sun X, Zhu Q, Qin Q (2021). A multi-level convolution pyramid semantic fusion framework for high-resolution remote sensing image scene classification and annotation.IEEE Access, 9: 18195–18208 CrossRef Google scholar

[31]	Szegedy C, Ioffe S, Vanhoucke V, Alemi A A (2017). Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI'17). AAAI Press, 4278–4284

[32]	Tian T, Liu X, Wang L (2019a). Remote sensing scene classification based on res-capsnet. In: IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium.IEEE, 2019: 525–528

[33]	Tian X, An J, Mu G (2019b). Power System Transient Stability Assessment Method Based on CapsNet. In: 2019 IEEE Innovative Smart Grid Technologies-Asia (ISGT Asia).IEEE, 2019: 1159–1164

[34]	Tong W, Chen W, Han W, Li X, Wang L (2020). Channel-attention-based DenseNet network for remote sensing image scene classification.IEEE J Sel Top Appl Earth Obs Remote Sens, 13: 4121–4132 CrossRef Google scholar

[35]	Vo T, Tran D, Ma W (2015). Tensor decomposition and application in image classification with histogram of oriented gradients.Neurocomputing, 165: 38–45 CrossRef Google scholar

[36]	Wang Y, Zhang J, Kan M (2020). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation.In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 12275–12284

[37]	Weng Q, Mao Z, Lin J, Guo W (2017). Land-use classification via extreme learning classifier based on deep convolutional features.IEEE Geosci Remote Sens Lett, 14(5): 704–708 CrossRef Google scholar

[38]	Xia G S, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L, Lu X (2017). AID: A benchmark data set for performance evaluation of aerial scene classification.IEEE Trans Geosci Remote Sens, 55(7): 3965–3981 CrossRef Google scholar

[39]	Yang Y, Newsam S (2010). Bag-of-visual-words and spatial extensions for land-use classification.In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2010: 270–279

[40]	Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2021). Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation.Int J Comput Vis, 129(11): 3051–3068 CrossRef Google scholar

[41]	Yu Y, Liu F (2018a). A two-stream deep fusion framework for high-resolution aerial scene classification.Comput Intell Neurosci, 2018: 8639367 CrossRef Google scholar

[42]	Yu Y, Liu F (2018b). Dense connectivity based two-stream deep feature fusion framework for aerial scene classification.Remote Sens (Basel), 10(7): 1158 CrossRef Google scholar

[43]	Zhang W, Tang P, Zhao L (2019). Remote sensing image scene classification using CNN-CapsNet.Remote Sens (Basel), 11(5): 494 CrossRef Google scholar

[44]	Zhang X, Wang G, Zhao S G (2022). CapsNet-COVID19: Lung CT image classification method based on CapsNet model.Math Biosci Eng, 19(5): 5055–5074 CrossRef Google scholar

[45]	Zhao B, Zhong Y, Zhang L, Huang B (2016). The Fisher kernel coding framework for high spatial resolution scene classification.Remote Sens (Basel), 8(2): 157 CrossRef Google scholar

[46]	Zhao D, Chen Y, Lv L (2017). Deep reinforcement learning with visual attention for vehicle classification.IEEE Trans Cogn Dev Syst, 9(4): 356–367 CrossRef Google scholar

[47]	Zhao X, Zhang J, Tian J, Zhuo L, Zhang J (2020). Residual dense network based on channel-spatial attention for the scene classification of a high-resolution remote sensing image.Remote Sens (Basel), 12(11): 1887 CrossRef Google scholar

[48]	Zhou B, Khosla A, Lapedriza A (2016). Learning deep features for discriminative localization.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2921–2929

Acknowledgments

This research was funded by the open fund of the Key Laboratory of Jianghuai Arable Land Resources Protection and Eco-restoration (Ministry of Natural Resources) (No. 2022-ARPE-KF04), and the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation (Ministry of Natural Resources) (No. KF-2020-05-084).