Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation
Zizhang WU , Yuanzhu GAN , Tianhao XU , Fan WANG
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (5) : 185327
Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation
The transformer-based semantic segmentation approaches, which divide the image into different regions by sliding windows and model the relation inside each window, have achieved outstanding success. However, since the relation modeling between windows was not the primary emphasis of previous work, it was not fully utilized. To address this issue, we propose a Graph-Segmenter, including a graph transformer and a boundary-aware attention module, which is an effective network for simultaneously modeling the more profound relation between windows in a global view and various pixels inside each window as a local one, and for substantial low-cost boundary adjustment. Specifically, we treat every window and pixel inside the window as nodes to construct graphs for both views and devise the graph transformer. The introduced boundary-aware attention module optimizes the edge information of the target objects by modeling the relationship between the pixel on the object’s edge. Extensive experiments on three widely used semantic segmentation datasets (Cityscapes, ADE-20k and PASCAL Context) demonstrate that our proposed network, a Graph Transformer with Boundary-aware Attention, can achieve state-of-the-art segmentation performance.
graph transformer / graph relation network / boundary-aware / attention / semantic segmentation
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171−4186 |
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2021, 9992−10002 |
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 3213−3223 |
| [21] |
|
| [22] |
Mottaghi R, Chen X, Liu X, Cho N G, Lee S, Fidler S, Urtasun R, Yuille A. The role of context for object detection and semantic segmentation in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 891−898 |
| [23] |
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431−3440 |
| [24] |
|
| [25] |
Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6230−6239 |
| [26] |
|
| [27] |
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H. Dual attention network for scene segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 3141−3149 |
| [28] |
Ding H, Zhang H, Liu J, Li J, Feng Z, Jiang X. Interaction via bi-directional graph of semantic region affinity for scene parsing. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2021, 15828−15838 |
| [29] |
|
| [30] |
|
| [31] |
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 603−612 |
| [32] |
|
| [33] |
He J, Deng Z, Zhou L, Wang Y, Qiao Y. Adaptive pyramid context network for semantic segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7511−7520 |
| [34] |
Ding H, Jiang X, Liu A Q, Thalmann N M, Wang G. Boundary-aware feature propagation for scene segmentation. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 6818−6828 |
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
Wang Y, Xu Z, Wang X, Shen C, Cheng B, Shen H, Xia H. End-to-end video instance segmentation with transformers. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 8737−8746 |
| [41] |
|
| [42] |
Strudel R, Garcia R, Laptev I, Schmid C. Segmenter: transformer for semantic segmentation. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2021, 7242−7252 |
| [43] |
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr P H S, Zhang L. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 6877−6886 |
| [44] |
|
| [45] |
Pan S Y, Lu C Y, Lee S P, Peng W H. Weakly-supervised image semantic segmentation using graph convolutional networks. In: Proceedings of IEEE International Conference on Multimedia and Expo. 2021, 1−6 |
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
Zhang L, Xu D, Arnab A, Torr P H S. Dynamic graph message passing networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 3723−3732 |
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7794−7803 |
| [67] |
Yang M, Yu K, Zhang C, Li Z, Yang K. DenseASPP for semantic segmentation in street scenes. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 3684−3692 |
| [68] |
|
| [69] |
|
| [70] |
Hou Q, Zhang L, Cheng M M, Feng J. Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 4002−4011 |
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
Fu J, Liu J, Wang Y, Li Y, Bao Y, Tang J, Lu H. Adaptive context network for scene parsing. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 6747−6756 |
| [76] |
|
| [77] |
Li X, Zhong Z, Wu J, Yang Y, Lin Z, Liu H. Expectation-maximization attention networks for semantic segmentation. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 9166−9175 |
| [78] |
Ding H, Jiang X, Shuai B, Liu A Q, Wang G. Semantic correlation promoted shape-variant context for segmentation. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 8877−8886 |
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |