Robust visual semantic perception for flexible grinding of complex welds

Junjun Wu; Weikun Qiu; Jinjia Huang; Haichu Chen

doi:10.1016/j.birob.2026.100288

Biomimetic Intelligence and Robotics ›› 2026, Vol. 6 ›› Issue (1) :100288 DOI: 10.1016/j.birob.2026.100288

Research Article

research-article

Robust visual semantic perception for flexible grinding of complex welds

Author information +

History +

PDF

Abstract

Semantic segmentation methods based on RGB images exhibit notable limitations in complex industrial scenarios, particularly in addressing interference factors such as dynamic lighting variations and polymorphic weld seam morphologies, which lead to insufficient feature extraction capabilities and reduced segmentation accuracy and robustness. To address these limitations, this study proposes a polymorphic weld seam semantic segmentation model (PWSM) based on multi-level feature fusion, which effectively integrates the informational advantages of RGB and depth images to enhance perceptual capabilities in complex environments. The proposed model introduces a Dual-Stream Dual-modal Fusion (DSDF) module that employs channel selection and spatial selection strategies to extract and enhance complementary features from RGB and depth images. Concurrently, a Multi-Level Feature Fusion Module (ML-FFM) is developed to progressively integrate low-level and high-level semantic information through a multi-scale mechanism, refining boundary features while preserving the integrity of feature representation. Experimental results demonstrate that the model achieves superior segmentation performance on a complex multi-form weld seam dataset, particularly showing enhanced accuracy and robustness in challenging scenarios involving occlusions and illumination variations. Compared with existing single-modal and multi-modal models, the proposed model achieves performance improvements of 1.52% and 0.65%, respectively, providing effective technical support for intelligent perception of polymorphic weld seams.

Keywords

Visual semantic perception / Robustness / Adaptive / Polymorphic weld seams / Flexible grinding robot

Cite this article

Download citation ▾

Junjun Wu, Weikun Qiu, Jinjia Huang, Haichu Chen. Robust visual semantic perception for flexible grinding of complex welds. Biomimetic Intelligence and Robotics, 2026, 6(1): 100288 DOI:10.1016/j.birob.2026.100288

登录浏览全文

4963

注册一个新账户忘记密码

CRediT authorship contribution statement

Junjun Wu: Writing – review & editing, Writing – original draft, Resources, Project administration, Methodology, Formal analysis, Conceptualization. Weikun Qiu: Writing – review & editing, Visualization, Software, Project administration, Data curation. Jinjia Huang: Writing – original draft, Visualization, Validation, Software, Project administration, Methodology, Data curation, Conceptualization. Haichu Chen: Supervision, Resources.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Key R&D Program of China (2022YFB4702300), in part by the National Natural Science Foundation of China (62273097), in part by the Guangdong Basic and Applied Basic Research Foundation (2025A1515010194), in part by the Key Areas Special Project for Scientific Research in Universities and Colleges of Guangdong Province (2025ZDZX3031), in part by the Guangdong Province Science and Technology Plan Project (2022A0505050017), in part by the Foshan Key Area Technology Research Foundation (2120001011009), in part by the Scientific Research Project of Guangdong Provincial Administration for Market Regulation (2025CT08), in part by the Research Project of Guangdong Provincial Institute of Special Equipment Inspection (2024JD205).

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (4) (2017) 834-848.

[2]	E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (4) (2016) 640-651.

[3]	O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer, 2015, pp. 234-241.

[4]	J.V. Hurtado, A. Valada, Semantic scene segmentation for robotics, Deep Learning for Robot Perception and Cognition, Elsevier, 2022, pp. 279-311.

[5]	H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2881-2890.

[6]	C. Hazirbas, L. Ma, C. Domokos, D. Cremers, Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture, Asian Conference on Computer Vision, Springer, 2016, pp. 213-228.

[7]	Y. Cheng, R. Cai, Z. Li, X. Zhao, K. Huang, Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3029-3037.

[8]	X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, C.-L. Tai, Transfusion: Robust lidar-camera fusion for 3d object detection with transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1090-1099.

[9]	J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, R. Stiefelhagen, CMX: Cross-Modal fusion for RGB-X semantic segmentation with transformers, IEEE Trans. Intell. Transp. Syst. 24 (12) (2023) 14679-14694, https://doi.org/10.1109/TITS.2023.3300537.

[10]	J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, et al., Deep high-resolution representation learning for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell. 43 (10) (2020) 3349-3364.

[11]	S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P.H. Torr, et al., Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6881-6890.

[12]	B. Cheng, A. Schwing, A. Kirillov, Per-pixel classification is not all you need for semantic segmentation, Adv. Neural Inf. Process. Syst. 34 (2021) 17864-17875.

[13]	B. Cheng, I. Misra, A.G. Schwing, A. Kirillov, R. Girdhar, Masked-attention mask transformer for universal image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1290-1299.

[14]	R. Ranftl, A. Bochkovskiy, V. Koltun, Vision transformers for dense prediction, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12179-12188.

[15]

Z. Zhou, M.M. Rahman Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, Springer, 2018, pp. 3-11.

[16]	R. Strudel, R. Garcia, I. Laptev, C. Schmid, Segmenter: Transformer for semantic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7262-7272.

[17]	Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012-10022.

[18]	E. Xie, W. Wang, Z. Yu, A. Anandkumar, J.M. Alvarez, P. Luo, SegFormer: Simple and efficient design for semantic segmentation with transformers, Adv. Neural Inf. Process. Syst. 34 (2021) 12077-12090.

[19]	D. Eigen, R. Fergus, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2650-2658.

[20]	Y. He, W.-C. Chiu, M. Keuper, M. Fritz, Std2p: Rgbd semantic segmentation using spatio-temporal data-driven pooling, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4837-4846.

[21]	Y. Naresh, S. Little, N.E. O’Connor, A residual Encoder-Decoder network for semantic segmentation in autonomous driving scenarios, 2018 26th European Signal Processing Conference, EUSIPCO, 2018, pp. 1052-1056, https://doi.org/10.23919/EUSIPCO.2018.8553161.

[22]	T.M. Quan, D.G.C. Hildebrand, W.-K. Jeong, Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics, Front. Comput. Sci. 3 (2021) 613981.

[23]	Y. Wang, Z. Sun, W. Zhao, Encoder- and Decoder-Based networks using multiscale feature fusion and nonlocal block for remote sensing image semantic segmentation, IEEE Geosci. Remote. Sens. Lett. 18 (7) (2021) 1159-1163, https://doi.org/10.1109/LGRS.2020.2998680.

[24]	X. Hu, K. Yang, L. Fei, K. Wang, Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation, 2019 IEEE International Conference on Image Processing, ICIP, IEEE, 2019, pp. 1440-1444.

[25]	X. Guo, W. Ma, F. Liang, Q. Mi, Dual-modal non-local context guided multi-stage fusion for indoor RGB-D semantic segmentation, Expert Syst. Appl. 255 (2024) 124598.

[26]	C. Tian, W. Xu, L. Bai, J. Yang, Y. Xu, GANet: geometry-aware network for RGB-D semantic segmentation, Appl. Intell. 55 (6) (2025) 454.

[27]	B. Ge, X. Zhu, Z. Tang, C. Xia, Y. Lu, Z. Chen, Triple fusion and feature pyramid decoder for RGB-D semantic segmentation, Multimedia Syst. 30 (5) (2024) 272.

[28]	Z. Peng, Y. Zheng, Y. Cheng, Y. Qiao, RDFormer: Efficient RGB-D semantic segmentation in complex outdoor scenes, 2024 5th International Conference on Machine Learning and Computer Application, ICMLCA, IEEE, 2024, pp. 170-175.