Vision Mamba attention feature fusion UNet: an innovative state space model for accurate polyp segmentation

Bo Yang; Biyuan Li; Gaowei Sun; Jinying Ma

doi:10.1007/s11801-026-4270-6

Optoelectronics Letters ›› 2026, Vol. 22 ›› Issue (3) :187 -192. DOI: 10.1007/s11801-026-4270-6

Article

research-article

Vision Mamba attention feature fusion UNet: an innovative state space model for accurate polyp segmentation

Author information +

History +

PDF

Abstract

Colorectal cancer (CRC) is a prevalent disease, with polyps serving as its precursors. Accurate polyp segmentation is crucial for early CRC prevention. However, due to different sizes of the polyps, the boundaries are not clear. Therefore, accurate segmentation of polyps is a challenging task. This paper proposes vision Mamba attention feature fusion UNet (VMA-UNet), a U-shaped asymmetric codec structure model grounded in the state space model (SSM). The VMA-UNet incorporates attention feature fusion (AFF) in order to enhance the feature representation of small polyps. A new IUD loss function, namely combining intersection over union (IoU) loss function and Dice loss function, is proposed to address both large polyps and small polyps, and to mitigate the issue of data imbalance. When applied to multiple datasets, VMA-UNet demonstrates robust performance, particularly in small polyp segmentation, showcasing its practical value. The network proposed in this paper overcomes the inherent shortcomings of convolutional neural network (CNN) and transformers, not only performing well in remote interaction modeling, but also maintaining linear computational complexity. Our study introduces a new method for polyp segmentation based on SSM and advances the field.

Keywords

Cite this article

Download citation ▾

Bo Yang, Biyuan Li, Gaowei Sun, Jinying Ma. Vision Mamba attention feature fusion UNet: an innovative state space model for accurate polyp segmentation. Optoelectronics Letters, 2026, 22(3): 187-192 DOI:10.1007/s11801-026-4270-6

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Chi Z, Lin Y, Huang Jet al. . Risk factors for recurrence of colorectal conventional adenoma and serrated polyp. Gastroenterology report. 2022, 10: goab038. J]

[2]	Wang Z, Yao Z, Wang Set al. . Deep learning coordinated with level set-based auxiliary refinement for polyps segmentation. Signal, image and video processing. 2023, 17(6): 2943-2951. J]

[3]	Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. 18th International Conference of Medical Image Computing and Computer-Assisted Intervention, October 5–9, 2015, Munich, Germany. 2015, Cham, Springer International Publishing234-241[C]

[4]

Zhou Z, Rahman Siddiquee M M, Tajbakhsh Net al. . UNet++: a nested U-Net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, September 20, 2018, Granada, Spain. 2018, Cham, Springer International Publishing311. C]

[5]	Jha D, Smedsrud P H, Riegler M Aet al. . ResUNet++: an advanced architecture for medical image segmentation. 2019 IEEE International Symposium on Multimedia, December 9–11, 2019, San Diego, CA, USA. 2019, New York, IEEE225230[C]

[6]	Fan D P, Ji G P, Zhou Tet al. . PRANet: parallel reverse attention network for polyp segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, October 4–8, 2020, Lima, Peru. 2020, Cham, Springer International Publishing263273[C]

[7]	LIU G, CHEN Z, LIU D, et al. FTMF-Net: a Fourier transform-multiscale feature fusion network for segmentation of small polyp objects[J]. IEEE transactions on instrumentation and measurement, 2023.

[8]	Liu S, Si F, Lin Y. CSUNet: a dual attention and hybrid convolutional network for polyp segmentation. Signal, image and video processing. 2024, 18(11): 8445-8456. J]

[9]	Pratik S, Sharma P, Rana Det al. . EU-Net: efficient U-shaped deep convolutional neural network for colon polyps segmentation. 6th International Conference on Energy, Power and Environment, June 20–22, 2024, Shillong, India. 2024, New York, IEEE15[C]

[10]	VASWANI A. Attention is all you need[J]. Advances in neural information processing systems, 2017.

[11]	OUKDACH Y, GARBAZ A, KERKAOU Z, et al. UViT-SEG: an efficient ViT and U-Net-based framework for accurate colorectal polyp segmentation in colonoscopy and WCE images[J]. Journal of imaging informatics in medicine, 2024: 1–21.

[12]	Zhai C, Yang L, Liu Yet al. . DBMA-Net: a dual-branch multiattention network for polyp segmentation. IEEE transactions on instrumentation and measurement. 2024, 73: 1-16. J]

[13]	Liu G, Yao S, Liu Det al. . CAFE-Net: cross-attention and feature exploration network for polyp segmentation. Expert systems with applications. 2024, 238: 121754. J]

[14]	GU A, DAO T. Mamba: linear-time sequence modeling with selective state spaces[EB/OL]. (2023-12-01) [2024-12-24]. https://arxiv.org/abs/2312.00752.

[15]	ZHU L, LIAO B, ZHANG Q, et al. Vision Mamba: efficient visual representation learning with bidirectional state space model[EB/OL]. (2024-01-17) [2024-12-24]. https://arxiv.org/abs/2401.09417.

[16]	Dai Y, Gieseke F, Oehmcke Set al. . Attentional feature fusion. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, January 3–8, 2021, Waikoloa, HI, USA. 2021, New York, IEEE3560-3569[C]

[17]	Elfwing S, Uchibe E, Doya K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks. 2018, 107: 3-11. J]

[18]	Yu J, Jiang Y, Wang Zet al. . Unitbox: an advanced object detection network. Proceedings of the 24th ACM International Conference on Multimedia, October 15–19, 2016, Amsterdam, The Netherlands. 2016516520[C]

[19]

Sudre C H, Li W, Vercauteren Tet al. . Generalized dice overlap as a deep learning loss function for highly unbalanced segmentations. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, September 14, 2017, Québec City, QC, Canada. 2017, Cham, Springer International Publishing240248[C]

[20]	Jha D, Smedsrud P H, Riegler M Aet al. . Kvasir-SEG: a segmented polyp dataset. 26th International Conference on MultiMedia Modeling, January 5–8, 2020, Daejeon, South Korea. 2020, Cham, Springer International Publishing451462[C]

[21]	Bernal J, Sánchez F J, Fernández-Esparrach Get al. . WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Computerized medical imaging and graphics. 2015, 43: 99-111. J]

[22]	Tajbakhsh N, Gurudu S R, Liang J. Automated polyp detection in colonoscopy videos using shape and context information. IEEE transactions on medical imaging. 2015, 35(2): 630-644. J]

[23]	Vázquez D, Bernal J, Sánchez F Jet al. . A benchmark for endoluminal scene segmentation of colonoscopy images. Journal of healthcare engineering. 2017, 1: 4037190[J]

[24]	Silva J, Histace A, Romain Oet al. . Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. International journal of computer assisted radiology and surgery. 2014, 9283-293. J]

[25]

Fang Y, Chen C, Yuan Yet al. . Selective feature aggregation network with area-boundary constraints for polyp segmentation. 22nd International Conference on Medical Image Computing and Computer Assisted Intervention, October 13–17, 2019, Shenzhen, China. 2019, Cham, Springer International Publishing302310[C]

[26]	Kim T, Lee H, Kim D. UACANet: uncertainty augmented context attention for polyp segmentation. Proceedings of the 29th ACM International Conference on Multimedia, October 20–24, 2021, Virtual. 202121672175[C]

[27]	Zhao X, Zhang L, Lu H. Automatic polyp segmentation via multi-scale subtraction network. 24th International Conference on Medical Image Computing and Computer Assisted Intervention, September 27–October 1, 2021, Strasbourg, France. 2021, Cham, Springer International Publishing120130[C]

[28]	Zhou T, Zhou Y, He Ket al. . Cross-level feature aggregation network for polyp segmentation. Pattern recognition. 2023, 140: 109555. J]

[29]	Lewis J, Cha Y J, Kim J. Dual encoder–decoder-based deep polyp segmentation network for colonoscopy images. Scientific reports. 2023, 1311183. J]

[30]	Bui N T, Hoang D H, Nguyen Q Tet al. . MEGANet: multi-scale edge-guided attention network for weak boundary polyp segmentation. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, January 3–8, 2024, Waikoloa, HI, USA. 2024, New York, IEEE7985-7994[C]