SDFSeg: multiscale perception and deformable feature fusion for coastal ecosystem
Xinjing Wang , Ziying Wu , Yuwen Wang , Haomiao Zhang , Shiyi Han , Ying Gao
Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1)
SDFSeg: multiscale perception and deformable feature fusion for coastal ecosystem
Monitoring coastal ecosystems is essential for mitigating pollution, preserving biodiversity, and understanding the impacts of climate change. However, existing approaches, such as fully convolutional network (FCN) and Transformer-based models, often struggle with challenges such as low-class variance, difficulty in detecting small targets, and loss of boundary information. To handle large variations in target scales, we propose a semantic segmentation framework, SDFSeg, which integrates three key modules: the scale aware conv, dynamic deformable sample, and fusion perceiver. The scale aware conv is designed to improve multiscale feature extraction by incorporating convolutional layers with varying dilation rates; the dynamic deformable sample precisely aligns target boundaries, focuses on small features, and enables adaptive dynamic sampling for improved small target detection and boundary segmentation; and the fusion perceiver effectively fuses local and global information. Extensive experiments on benchmark datasets demonstrate that our method achieves a superior performance while reducing the computational overhead, confirming its practical applicability.
Semantic segmentation / Multiscale feature extraction / Coastal ecosystem monitoring / Boundary segmentation
| [1] |
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495 |
| [2] |
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018a) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184 |
| [3] |
Chen LC, Zhu YK, Papandreou G, Schroff F, Adam H (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari V et al (eds) Computer Vision–ECCV 2018. Lecture notes in computer science, vol 11211. Springer, Cham, pp 833–851. https://doi.org/10.1007/978-3-030-01234-2_49 |
| [4] |
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR, pp 1597–1607 |
| [5] |
|
| [6] |
Cheng BW, Misra I, Schwing AG, Kirillov A, Girdhar R (2022) Masked-attention mask Transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 1280–1289 |
| [7] |
Fu J, Liu J, Tian HJ, Li Y, Bao YJ, Fang ZW, Lu HQ (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 3141–3149 |
| [8] |
|
| [9] |
Huang ZL, Wang XG, Huang LC, Huang C, Wei YC, Liu W (2019) CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, pp 603–612 |
| [10] |
|
| [11] |
|
| [12] |
Li JW, Shi KY, Xie GS, Liu XF, Zhang J, Zhou TF (2024) Label-efficient few-shot semantic segmentation with unsupervised meta-training. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, pp 3109–3117. https://doi.org/10.1609/aaai.v38i4.28094 |
| [13] |
Lin TY, Dollár P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 936–944 |
| [14] |
Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z et al (2021) Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, pp 9992–10002 |
| [15] |
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3431–3440 |
| [16] |
|
| [17] |
Lyu Y, Vosselman G, Xia GS, Yilmaz A, Yang MY (2020) UAVid: a semantic segmentation dataset for UAV imagery. ISPRS J Photogramm Remote Sens 165:108–119 |
| [18] |
|
| [19] |
Oktay O, Schlemper J, Le Folgoc L, Lee M, Heinrich M, Misawa K et al (2018) Attention U-Net: learning where to look for the pancreas. Preprint at arXiv:1804.03999 |
| [20] |
|
| [21] |
|
| [22] |
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Preprint at arXiv:1505.04597 |
| [23] |
Sun K, Xiao B, Liu D, Wang JD (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 5686–5696 |
| [24] |
Vo XT, Nguyen DL, Priadana A, Jo KH (2025) Efficient vision Transformers with partial attention. In: Leonardis A et al (eds) Computer Vision–ECCV 2024. Lecture notes in computer science, vol 15141. Springer, Cham, pp 298–317. https://doi.org/10.1007/978-3-031-73010-8_18 |
| [25] |
Wang JD, Sun K, Cheng TH, Jiang BR, Deng CR, Zhao Y et al (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364. https://doi.org/10.1109/TPAMI.2020.2983686 |
| [26] |
|
| [27] |
Yuan YH, Chen XL, Wang JD (2020) Object-contextual representations for semantic segmentation. In: Vedaldi A et al (eds) Computer Vision–ECCV 2020. Lecture notes in Computer science, vol 12351. Springer, Cham, pp 173–190. https://doi.org/10.1007/978-3-030-58539-6_11 |
| [28] |
Zhao HS, Shi JP, Qi XJ, Wang XG, Jia JY (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 6230–6239 |
The Author(s)
/
| 〈 |
|
〉 |