Monitoring coastal ecosystems is essential for mitigating pollution, preserving biodiversity, and understanding the impacts of climate change. However, existing approaches, such as fully convolutional network (FCN) and Transformer-based models, often struggle with challenges such as low-class variance, difficulty in detecting small targets, and loss of boundary information. To handle large variations in target scales, we propose a semantic segmentation framework, SDFSeg, which integrates three key modules: the scale aware conv, dynamic deformable sample, and fusion perceiver. The scale aware conv is designed to improve multiscale feature extraction by incorporating convolutional layers with varying dilation rates; the dynamic deformable sample precisely aligns target boundaries, focuses on small features, and enables adaptive dynamic sampling for improved small target detection and boundary segmentation; and the fusion perceiver effectively fuses local and global information. Extensive experiments on benchmark datasets demonstrate that our method achieves a superior performance while reducing the computational overhead, confirming its practical applicability.
| [1] |
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
|
| [2] |
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018a) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
|
| [3] |
Chen LC, Zhu YK, Papandreou G, Schroff F, Adam H (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari V et al (eds) Computer Vision–ECCV 2018. Lecture notes in computer science, vol 11211. Springer, Cham, pp 833–851. https://doi.org/10.1007/978-3-030-01234-2_49
|
| [4] |
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR, pp 1597–1607
|
| [5] |
ChenYX, LiuZH, ChenZQ. AMS: a hyperspectral image classification method based on SVM and multi-modal attention network. Knowl-Based Syst, 2025, 314113236.
|
| [6] |
Cheng BW, Misra I, Schwing AG, Kirillov A, Girdhar R (2022) Masked-attention mask Transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 1280–1289
|
| [7] |
Fu J, Liu J, Tian HJ, Li Y, Bao YJ, Fang ZW, Lu HQ (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 3141–3149
|
| [8] |
HeidlerK, MouLC, BaumhoerC, DietzA, ZhuXX. HED-UNet: combined segmentation and edge detection for monitoring the antarctic coastline. IEEE Trans Geosci Remote Sens, 2022, 60: 1-14.
|
| [9] |
Huang ZL, Wang XG, Huang LC, Huang C, Wei YC, Liu W (2019) CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, pp 603–612
|
| [10] |
JamesRK, KeyzerLM, van de VeldeSJ, HermanPMJ, van KatwijkMM, BoumaTJ. Climate change mitigation by coral reefs and seagrass beds at risk: how global change compromises coastal ecosystem services. Sci Total Environ, 2023, 857159576.
|
| [11] |
LiHY, MaoDH, WangZM, HuangX, LiL, JiaMM. Invasion of Spartina alterniflora in the coastal zone of mainland China: control achievements from 2015 to 2020 towards the Sustainable Development Goals. J Environ Manage, 2022, 323116242.
|
| [12] |
Li JW, Shi KY, Xie GS, Liu XF, Zhang J, Zhou TF (2024) Label-efficient few-shot semantic segmentation with unsupervised meta-training. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, pp 3109–3117. https://doi.org/10.1609/aaai.v38i4.28094
|
| [13] |
Lin TY, Dollár P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 936–944
|
| [14] |
Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z et al (2021) Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, pp 9992–10002
|
| [15] |
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3431–3440
|
| [16] |
LuoJJ, ZhaoTH, CaoL, BiljeckiF. Semantic Riverscapes: perception and evaluation of linear landscapes from oblique imagery using computer vision. Landsc Urban Plan, 2022, 228104569.
|
| [17] |
Lyu Y, Vosselman G, Xia GS, Yilmaz A, Yang MY (2020) UAVid: a semantic segmentation dataset for UAV imagery. ISPRS J Photogramm Remote Sens 165:108–119
|
| [18] |
MaGY, YueXF. An improved whale optimization algorithm based on multilevel threshold image segmentation using the Otsu method. Eng Appl Artif Intell, 2022, 113104960.
|
| [19] |
Oktay O, Schlemper J, Le Folgoc L, Lee M, Heinrich M, Misawa K et al (2018) Attention U-Net: learning where to look for the pancreas. Preprint at arXiv:1804.03999
|
| [20] |
PelletierC, WebbGI, PetitjeanF. Temporal convolutional neural network for the classification of satellite image time series. Remote Sens, 2019, 115532.
|
| [21] |
RezaeeM, MahdianpariM, ZhangY, SalehiB. Deep convolutional neural network for complex wetland classification using optical remote sensing imagery. IEEE J Sel Top Appl Earth Observ Remote Sens, 2018, 11(9): 3030-3039.
|
| [22] |
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Preprint at arXiv:1505.04597
|
| [23] |
Sun K, Xiao B, Liu D, Wang JD (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 5686–5696
|
| [24] |
Vo XT, Nguyen DL, Priadana A, Jo KH (2025) Efficient vision Transformers with partial attention. In: Leonardis A et al (eds) Computer Vision–ECCV 2024. Lecture notes in computer science, vol 15141. Springer, Cham, pp 298–317. https://doi.org/10.1007/978-3-031-73010-8_18
|
| [25] |
Wang JD, Sun K, Cheng TH, Jiang BR, Deng CR, Zhao Y et al (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43(10):3349–3364. https://doi.org/10.1109/TPAMI.2020.2983686
|
| [26] |
YekeenST, BalogunA, YusofKBW. A novel deep learning instance segmentation model for automated marine oil spill detection. ISPRS J Photogramm Remote Sens, 2020, 167: 190-200.
|
| [27] |
Yuan YH, Chen XL, Wang JD (2020) Object-contextual representations for semantic segmentation. In: Vedaldi A et al (eds) Computer Vision–ECCV 2020. Lecture notes in Computer science, vol 12351. Springer, Cham, pp 173–190. https://doi.org/10.1007/978-3-030-58539-6_11
|
| [28] |
Zhao HS, Shi JP, Qi XJ, Wang XG, Jia JY (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 6230–6239
|
Funding
National Natural Science Foundation of China(62401310)
RIGHTS & PERMISSIONS
The Author(s)