Referring image segmentation with attention guided cross modal fusion for semantic oriented languages
Qianli ZHOU, Rong WANG, Haimiao HU, Quange TAN, Wenjin ZHANG
Referring image segmentation with attention guided cross modal fusion for semantic oriented languages
[1] |
Mogadala A , Kalimuthu M , Klakow D . Trends in integration of vision and language research: a survey of tasks, datasets, and methods. Journal of Artificial Intelligence Research, 2021, 71
|
[2] |
Wu Y , Luo X , Yang Z . Semantic separator learning and its applications in unsupervised Chinese text parsing. Frontiers of Computer Science, 2013, 7( 1): 55– 68
|
[3] |
Margffoy-Tuay E, Pérez J C, Botero E, Arbeláez P. Dynamic multimodal instance segmentation guided by natural language queries. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 656–672
|
[4] |
Lei T, Zhang Y, Wang S I, Dai H, Artzi Y. Simple recurrent units for highly parallelizable recurrence. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018, 4470–4481
|
[5] |
Zhang Y, Lei T. Training RNNs as fast as CNNs. See Openreview.net website. 2018
|
[6] |
Ye L, Rochan M, Liu Z, Wang Y. Cross-modal self-attention network for referring image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 10494−10503
|
[7] |
Jadon S. A survey of loss functions for semantic segmentation. In: Proceedings of the IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). 2020, 1– 7
|
[8] |
Yu L, Poirson P, Yang S, Berg A C, Berg T L. Modeling context in referring expressions. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 69–85
|
[9] |
Mao J, Huang J, Toshev A, Camburu O, Yuille A, Murphy K. Generation and comprehension of unambiguous object descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 11– 20
|
[10] |
Kazemzadeh S, Ordonez V, Matten M, Berg T. ReferItGame: referring to objects in photographs of natural scenes. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNL). 2014, 787– 798
|
/
〈 | 〉 |