Label distribution learning for scene text detection

Haoyu MA; Ningning LU; Junjun MEI; Tao GUAN; Yu ZHANG; Xin GENG

doi:10.1007/s11704-022-1446-5

PDF(2041 KB)

Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (6) : 176339. DOI: 10.1007/s11704-022-1446-5

Excellent Young Computer Scientists Forum

RESEARCH ARTICLE

Label distribution learning for scene text detection

Author information +

History +

Abstract

Recently, segmentation-based scene text detection has drawn a wide research interest due to its flexibility in describing scene text instance of arbitrary shapes such as curved texts. However, existing methods usually need complex post-processing stages to process ambiguous labels, i.e., the labels of the pixels near the text boundary, which may belong to the text or background. In this paper, we present a framework for segmentation-based scene text detection by learning from ambiguous labels. We use the label distribution learning method to process the label ambiguity of text annotation, which achieves a good performance without using additional post-processing stage. Experiments on benchmark datasets demonstrate that our method produces better results than state-of-the-art methods for segmentation-based scene text detection.

Graphical abstract

Keywords

scene text detection / multi-task learning / label distribution learning

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Haoyu MA, Ningning LU, Junjun MEI, Tao GUAN, Yu ZHANG, Xin GENG. Label distribution learning for scene text detection. Front. Comput. Sci., 2023, 17(6): 176339 https://doi.org/10.1007/s11704-022-1446-5

This is a preview of subscription content, contact us for subscripton.

Haoyu Ma is currently a master candidate in computer science from Southeast University, China. He received his BS degree from Capital Normal University, China in 2018. His research interests include machine learning, pattern recognition and scene text detection

Ningning Lu received the BSc (2010) degree in mechanical design and automation from Hefei University of Technology, China and MSc (2021) degree in computer science from Southeast University, China. His research interests include machine learning, pattern recognition, computer vision, and cyber security

Junjun Mei is the ZTE’s chief R&D engineer in the field of audio and video, engaged in the research of the overall architecture of the integrated video cloud network and key technologies such as computer vision, audio and video coding, and audio and video transmission, and presided over the R&D and design of a number of system solutions

Tao Guan is the senior system architecter of ZTE, China, mainly engaged in the architecture design and algorithm research of video systems and industrial digital systems, participated in standard organizations, initiated and compiled the formulation of a number of communication standards, and applied for more than 20 national invention patents

Yu Zhang is currently an associate Professor with the School of Computer Science and Engineering, Southeast University, China. He received his BS and MS degrees in telecommunications engineering from Xidian University, China in 2001 and 2004, respectively, and PhD degree from Nanyang Technological University, Singapore in 2014. His research areas include computer vision, machine learning, object recognition, video analysis, human action analysis, 3D pose estimation

Xin Geng received the BS and MS degrees in computer science from Nanjing University, China in 2001 and 2004, respectively, and the PhD degree from Deakin University, Australia in 2008. He joined the School of Computer Science and Engineering at Southeast University, China in 2008, and is currently a professor and vice dean of the school. He has authored over 50 refereed papers, and he holds five patents in these areas. His research interests include pattern recognition, machine learning, and computer vision

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zhu A, Uchida S . Scene word recognition from pieces to whole. Frontiers of Computer Science, 2019, 13( 2): 292–301

[2]	He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778

[3]	Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431−3440

[4]	Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 21−37

[5]	Ren S, He K, Girshick R, Sun J . Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 6): 1137–1149

[6]	Jiang H, Cheng M M, Li S J, Borji A, Wang J . Joint salient object detection and existence prediction. Frontiers of Computer Science, 2019, 13( 4): 778–788

[7]	Li M, Mao J, Qi X, Jin C . A framework for cloned vehicle detection. Frontiers of Computer Science, 2020, 14( 5): 145609

[8]	Tian Z, Huang W, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 56−72

[9]	Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 4161−4167

[10]	Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: an efficient and accurate scene text detector. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2642−2651

[11]	Deng D, Liu H, Li X, Cai D. PixelLink: detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 6773−6780

[12]	Long S, Ruan J, Zhang W, He X, Wu W, Yao C. TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 19−35

[13]	Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S. Shape robust text detection with progressive scale expansion network. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9328−9337

[14]	Liao M, Wan Z, Yao C, Chen K, Bai X. Real-time scene text detection with differentiable binarization. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 11474−11481

[15]	Shi B, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3482−3490

[16]	Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z. R2CNN: rotational region CNN for orientation robust scene text detection. 2017, arXiv preprint arXiv: 1706.09579

[17]	Gao B B, Xing C, Xie C W, Wu J, Geng X . Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 2017, 26( 6): 2825–2838

[18]	Geng X, Yin C, Zhou Z H . Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35( 10): 2401–2412

[19]	Liao M, Shi B, Bai X . Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27( 8): 3676–3690

[20]	Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3454−3461

[21]	Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z. Scene text detection via holistic, multi-channel prediction. 2016, arXiv preprint arXiv: 1606.09002

[22]	Cour T, Sapp B, Jordan C, Taskar B. Learning from ambiguously labeled images. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 919−926

[23]	Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y. Deformable convolutional networks. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 764−773

[24]	Zhu X, Hu H, Lin S, Dai J. Deformable convNets V2: more deformable, better results. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9300−9308

[25]	Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2315−2324

[26]	Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. 2017, 935−942

[27]	Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S, Shafait F, Uchida S, Valveny E. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015, 1156−1160

[28]	Yao C, Bai X, Liu W, Ma Y, Tu Z. Detecting texts of arbitrary orientations in natural images. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1083−1090

[29]	Yao C, Bai X, Liu W . A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23( 11): 4737–4749

[30]	Liu Y, Jin L, Zhang S, Luo C, Zhang S . Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 2019, 90: 337–345

[31]	Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248−255

[32]	Wang X, Jiang Y, Luo Z, Liu C L, Choi H, Kim S. Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 6442−6451

[33]	Lyu P, Liao M, Yao C, Wu W, Bai X. Mask textSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 71−88

[34]	Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X . TextField: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 2019, 28( 11): 5566–5579

[35]	Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X. Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 10544−10553

[36]	Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9357−9366

[37]	Liu Z, Lin G, Yang S, Liu F, Lin W, Goh W L. Towards robust curve text detection with conditional spatial expansion. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7261−7270

[38]	Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J. Learning shape-aware embedding for scene text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 4229−4238

[39]	He P, Huang W, He T, Zhu Q, Qiao Y, Li X. Single shot text detector with regional attention. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 3066−3074

[40]	Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E. WordSup: exploiting word annotations for character based text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 4950−4959

[41]	Lyu P, Yao C, Wu W, Yan S, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7553−7563

[42]	Liao M, Zhu Z, Shi B, Xia G S, Bai X. Rotation-sensitive regression for oriented scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5909−5918

[43]	Liu Z, Lin G, Yang S, Feng J, Lin W, Goh W L. Learning Markov clustering networks for scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 6936−6944

[44]	He T, Huang W, Qiao Y, Yao J . Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 2016, 25( 6): 2529–2541

[45]	He W, Zhang X Y, Yin F, Liu C L. Deep direct regression for multi-oriented scene text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 745−753

[46]	Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X . Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20( 11): 3111–3122

[47]	Xue C, Lu S, Zhan F. Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 370−387

[48]	Xue C, Lu S, Zhang W. MSR: multi-scale shape regression for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 989−995

Acknowledgements

This work was supported by the National Key R&D Program of China (2018AAA0100104, 2018AAA0100100), the National Natural Science Foundation of China (Grant No. 61702095), and the Natural Science Foundation of Jiangsu Province (BK20211164).