Improved image semantic segmentation algorithm based on EMA

Jiadong DU , Ting LI , Hongwei GE

Journal of Measurement Science and Instrumentation ›› 2024, Vol. 15 ›› Issue (2) : 185 -194.

PDF (3185KB)
Journal of Measurement Science and Instrumentation ›› 2024, Vol. 15 ›› Issue (2) :185 -194. DOI: 10.62756/jmsi.1674-8042.2024019
Signal and image processing technology
research-article

Improved image semantic segmentation algorithm based on EMA

Author information +
History +
PDF (3185KB)

Abstract

Aiming at the lack of semantic correlation between the parameters of expectation maximization attention(EMA) algorithm and images and the lack of attention to inter-channel information, a dual attention network EMA+ algorithm was proposed. Two modules were designed: spatial attention module and channel attention module. The EMA algorithm was used as the main structure by the spatial attention module. In the responsibility estimation step, the feature map itself was used as the initial parameter in the expectation maximization(EM) algorithm, and the semantic association between the parameter and the feature map was increased. Efficient channel attention(ECA) was used in the channel attention module by using one-dimensional convolution to learn the interactive information between channels. It avoided breaking the direct correspondence between channels and their weights due to dimensionality reduction operations. EMA+ significantly improved semantic segmentation tasks’ performance by fusing spatial attention modules and channel attention modules. The experimental results showed that EMA+ has achieved better intersection-over-union than EMANet and other methods on PASCAL VOC 2012 and some more complex datasets, and had better generalization ability.

Keywords

deep learning / image semantic segmentation / expectation-maximization attention (EMA) / dual attention network (DANet) / efficient channel attention (ECA)

Cite this article

Download citation ▾
Jiadong DU, Ting LI, Hongwei GE. Improved image semantic segmentation algorithm based on EMA. Journal of Measurement Science and Instrumentation, 2024, 15(2): 185-194 DOI:10.62756/jmsi.1674-8042.2024019

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

LUO H L, ZHANG Y. A survey of image semantic segmentation based on deep network. Acta Electronica Sinica, 2019, 47(10): 2211-2220.

[2]

XU H, ZHU Y H, ZHEN T, et al. Survey of semantic methods based on deep neural network. Journal of Frontiers of Computer Science and Technology, 2021, 15(1): 47-59.

[3]

SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.

[4]

LI X, WU J, LIN Z, et al. Recurrent squeeze-and-excitation context aggregation net for single image deraining//European Conference on Computer Vision, September 8-14, Munich, Germany. Berling: Springer, 2018: 254-269.

[5]

CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation. Computer Vision and Pattern Recognition, arXiv: 1706.05587.

[6]

ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network//IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, Honolulu, HI, New York: IEEE, 2017: 6230-6239.

[7]

PENG C, ZHANG X, YU G, et al. Large kernel matters-improve semantic segmentation by global convolutional network//IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, New York: IEEE, 2017: 1743-1751.

[8]

YU C, WANG J, PENG C, et al. Learning a discriminative feature network for semantic segmentation//IEEE Conference on Computer Vision and Pattern Recognition, June 18-23, Salt Lake City, UT, New York: IEEE, 2018: 1857-1866.

[9]

CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation//European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Berling: Springer, 2018: 833-851.

[10]

VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need//The 31st International Conference on Neural Information Processing Systems (NIPS’17), December 4-9, Red Hook, NY, USA, Cambridge: MIT Press, 2017: 6000-6010.

[11]

WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks//IEEE Conference on Computer Vision and Pattern Recognition, August 20-24, Salt Lake City, UT, USA, New York: IEEE, 2018: 7794-7803.

[12]

ZHAO H, ZHANG Y, LIU S, et al. Psanet: Point-wise spatial attention network for scene parsing//European Conference on Computer Vision. Munich, September 8-14, 2018, Munich, Germany. Berling: Springer, 2018: 270-286.

[13]

YANG Z Q, FAN Y S, YU H Y. An improved image semantic segmentation algorithm based on U-Net network. Journal of North University of China (Natural Science Edition), 2023, 44(4): 397-402.

[14]

CHEN Y, KALANTIDIS Y, LI J, et al. A^2-nets: Double attention networks//The 32nd International Conference on Neural Information Processing Systems (NIPS’18), December 3-8, Red Hook, NY, USA. Cambridge: MIT Press, 2018: 350-359.

[15]

FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation//IEEE/CVF Conference on Computer Vision and Pattern Recognition,June 15-20, 2019, Long Beach, CA, USA. New York: IEEE, 2019: 3141-3149.

[16]

LI X, ZHONG Z, WU J, et al. Expectation-maximization attention networks for semantic segmentation//IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, Korea (South). New York: IEEE, 2019: 9166-9175.

[17]

WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks //IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, Seattle, WA, USA. New York: IEEE, 2020: 11531-11539.

[18]

HU J, SHEN L, SUN G. Squeeze-and-excitation networks//IEEE Conference on Computer Vision and Pattern Recognition, June 18-23, Salt Lake City, UT, USA. New York: IEEE, 2018: 7132-7141.

[19]

EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 2010, 88(2): 303-338.

[20]

ZHOU B, ZHAO H, PUIG X, et al. Scene parsing through ADE20K dataset//IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017: 5122-5130.

[21]

CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding//IEEE Conference on Computer Vision and Pattern Recognition, June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 2016: 3213-3223.

[22]

MOTTAGHI R, CHEN X, LIU X, et al. The role of context for object detection and semantic segmentation in the wild//IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 2014: 891-898.

[23]

CAESAR H, UIJLINGS J, FERRARI V. Coco-stuff: Thing and stuff classes in context//IEEE Conference on Computer Vision and Pattern Recognition, August 20-24, Salt Lake City, UT, USA. New York: IEEE, 2018: 1209-1218.

[24]

GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A review on deep learning techniques applied to semantic segmentation. Computer Vision and Pattern Recognition, arXiv:1704.06857.

[25]

HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition//IEEE Conference on Computer Vision and Pattern Recognition, June 27-30, Las Vegas, NV, USA. New York: IEEE, 2016: 770-778.

[26]

RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.

[27]

GUO M H, LIU Z N, MU T J, et al. Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(5): 5436-5447.

[28]

ZHANG H, DANA K, SHI J, et al. Context encoding for semantic segmentation//IEEE Conference on Computer Vision and Pattern Recognition, August 20-24, Salt Lake City, UT, USA. New York: IEEE, 2018: 7151-7160.

[29]

YU C, WANG J, PENG C, et al. Learning a discriminative feature network for semantic segmentation//IEEE Conference on Computer Vision and Pattern Recognition, August 20-24, Salt Lake City, UT, USA. New York: IEEE, 2018: 1857-1866.

[30]

HOU Q, ZHANG L, CHENG M M, et al. Strip pooling: Rethinking spatial pooling for scene parsing//IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA, New York: IEEE, 2020: 4002-4011.

[31]

HUANG Y, KANG D, JIA W, et al. Channelized axial attention for semantic segmentation-considering channel relation within spatial attention for semantic segmentation. Computer Vision and Pattern Recognition, arXiv: 2101.07434.

[32]

BAO H, DONG L, PIAO S, et al. Beit: Bert pre-training of image transformers. Computer Vision and Pattern Recognition, arXiv:2106.08254.

[33]

CHENG B, SCHWING A, KIRILLOV A. Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems, 2021, 34: 17864-17875.

[34]

DONG X, BAO J, CHEN D, et al. Cswin transformer: A general vision transformer backbone with cross-shaped windows//IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-24, 2022, New Orleans, Louisiana, USA, New York: IEEE, 2022: 12114-12124.

[35]

JAIN J, SINGH A, ORLOV N, et al. Semask: Semantically masked transformers for semantic segmentation. Computer Vision and Pattern Recognition, arXiv:2112.12782.

[36]

LIN G, MILAN A, SHEN C, et al. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation//IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017: 5168-5177.

[37]

DING H, JIANG X, SHUAI B, et al. Semantic correlation promoted shape-variant context for segmentation//IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 15-20, 2019, Long Beach, CA, USA. New York: IEEE, 2019: 8877-8886.

PDF (3185KB)

85

Accesses

0

Citation

Detail

Sections
Recommended

/