End-to-end dilated convolution network for document image semantic segmentation

Can-hui Xu , Cao Shi , Yi-nong Chen

Journal of Central South University ›› 2021, Vol. 28 ›› Issue (6) : 1765 -1774.

PDF
Journal of Central South University ›› 2021, Vol. 28 ›› Issue (6) : 1765 -1774. DOI: 10.1007/s11771-021-4731-9
Article

End-to-end dilated convolution network for document image semantic segmentation

Author information +
History +
PDF

Abstract

Semantic segmentation is a crucial step for document understanding. In this paper, an NVIDIA Jetson Nano-based platform is applied for implementing semantic segmentation for teaching artificial intelligence concepts and programming. To extract semantic structures from document images, we present an end-to-end dilated convolution network architecture. Dilated convolutions have well-known advantages for extracting multi-scale context information without losing spatial resolution. Our model utilizes dilated convolutions with residual network to represent the image features and predicting pixel labels. The convolution part works as feature extractor to obtain multidimensional and hierarchical image features. The consecutive deconvolution is used for producing full resolution segmentation prediction. The probability of each pixel decides its predefined semantic class label. To understand segmentation granularity, we compare performances at three different levels. From fine grained class to coarse class levels, the proposed dilated convolution network architecture is evaluated on three document datasets. The experimental results have shown that both semantic data distribution imbalance and network depth are import factors that influence the document’s semantic segmentation performances. The research is aimed at offering an education resource for teaching artificial intelligence concepts and techniques.

Keywords

semantic segmentation / document images / deep learning / NVIDIA jetson nano

Cite this article

Download citation ▾
Can-hui Xu, Cao Shi, Yi-nong Chen. End-to-end dilated convolution network for document image semantic segmentation. Journal of Central South University, 2021, 28(6): 1765-1774 DOI:10.1007/s11771-021-4731-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

ChenY-n, ZhouZ-ZService-oriented computing and software integration in computing curriculum [C], 2014, Phoenix, AZ, USA, IEEE, 14792480

[2]

ChenY-n, De LucaGVIPLE: Visual IoT/robotics programming language environment for computer science education [C], 2016, Chicago, IL, USA, IEEE, 963971

[3]

HanY, OrukluETraffic sign recognition based on the NVIDIA Jetson TX1 embedded system using convolutional neural networks [C], 2017, Boston, MA, USA, IEEE, 184187

[4]

ChenL C, PapandreouG, KokkinosI, MurphyK, YuilleA L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848

[5]

ShelhamerE, LongJ, DarrellT. Fully convolutional networks for semantic segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651

[6]

YangX, YumerE, AsenteP, KraleyM, KiferD, GilesC LLearning to extract semantic structure from documents using multimodal fully convolutional neural networks [C], 2017, Honolulu, HI, USA, IEEE, 43424351

[7]

DrivasD, AminA. Page segmentation and classification utilising a bottom-up approach [J]. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1995, 2: 610-614

[8]

SimonA, PretJ C, JohnsonA P. A fast algorithm for bottom-up document layout analysis [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(3): 273-277

[9]

HaJ, HaralickR M, PhillipsI TRecursive X-Y cut using bounding boxes of connected components [C], 1995, Montreal, QC, Canada, IEEE, 952955

[10]

CAI Deng, YU Shi-peng, WEN Ji-Rong, MA Wei-ying. Vips: A vision-based page segmentation algorithm [EB/OL]. [2003-11-01].https://www.microsoft.com/en-us/research/publication/vips-a-vision-based-page-segmentation-algorithm/.

[11]

KiseK, SatoA, IwataM. Segmentation of page images using the area voronoi diagram [J]. Computer Vision and Image Understanding, 1998, 70(3): 370-382

[12]

YiX-h, GaoL-c, LiaoY, ZhangX-d, LiuR-t, JiangZ-RCNN based page object detection in document images [C], 2017, Kyoto, Japan, IEEE, 230235

[13]

LinX-y, GaoL-c, TangZ, BakerJ, SorgeV. Mathematical formula identification and performance evaluation in PDF documents [J]. International Journal on Document Analysis and Recognition (IJDAR), 2014, 17(3): 239-255

[14]

XuC-h, TangZ, TaoX, LiY, ShiC. Graph-based layout analysis for PDF documents [C]. Proc SPIE 8664, Imaging and Printing in a Web 2 0 World IV, 2013, 8664: 866407

[15]

XuC-h, TangZ, TaoX, ShiC. Graphic composite segmentation for PDF documents with complex layouts [C]. Document Recognition and Retrieval XX, 2013, 865886580E

[16]

LUONG M T, NGUYEN T D, KAN M Y. Logical structure recovery in scholarly articles with rich document features [M]// Multimedia Storage and Retrieval Innovations for Digital Library Systems. IGI Global, 2012: 270–292. DOI: https://doi.org/10.4018/978-1-4666-0900-6.ch014.

[17]

TaoX, TangZ, XuC. Contextual modeling for logical labeling of PDF documents [J]. Computers & Electrical Engineering, 2014, 40(4): 1363-1375

[18]

TaoX, TangZ, XuC-h, WangY-TLogical labeling of fixed layout PDF documents using multiple contexts [C], 2014, Tours, France, IEEE, 360-364

[19]

DelayeA, LiuC-L. Contextual text/non-text stroke classification in online handwritten notes with conditional random fields [J]. Pattern Recognition, 2014, 47(3): 959-968

[20]

VincentN, OgierJ M. Shall deep learning be the mandatory future of document analysis problems? [J]. Pattern Recognition, 2019, 86: 281-289

[21]

GirshickRFast R-CNN [C], 2015, Santiago, Chile, IEEE, 14401448

[22]

TianZ, HuangW-l, HeT, HeP, QiaoYDetecting text in natural image with connectionist text proposal network [M], 2016, Cham, Springer International Publishing, 5672

[23]

ChenL C, PapandreouG, KokkinosI, MurphyK, YuilleA L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848

[24]

NohH, HongS, HanBLearning deconvolution network for semantic segmentation [C], 2015, Santiago, Chile, IEEE, 15201528

[25]

YuF, KoltunV, FunkhouserTDilated residual networks [C], 2017, Honolulu, HI, USA, IEEE, 636644

[26]

HeK-m, ZhangX-y, RenS-q, SunJDeep residual learning for image recognition [C], 2016, Las Vegas, NV, USA, IEEE, 770778

[27]

YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [EB/OL].[2015-11-23]. https://arxiv.org/abs/1511.07122.

[28]

CHEN L C, PAPANDREOU G, SCHROFF F, ADAM H. Rethinking atrous convolution for semantic image segmentation [EB/OL].[2017-06-17]. https://arxiv.org/abs/1706.05587.

[29]

GaoL-c, YiX-h, JiangZ-r, HaoL-p, TangZICDAR2017 competition on page object detection [C], 2017, Kyoto, Japan, IEEE, 14171422

[30]

REN Shao-qing, HE Kai-ming, GIRSHICK R, SUN Jian. Faster R-CNN: Towards real-time object detection with region proposal networks [C]// IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, 2016: 1137–1149. DOI: https://doi.org/10.1109/TPAMI.2016.2577031.

[31]

LiuW, AnguelovD, ErhanD, SzegedyC, ReedS, FuC Y, BergA CSSD: Single shot multibox detector [C], 2016, Amsterdam, Springer, 2137

[32]

HaoL-p, GaoL-c, YiX-h, TangZA table detection method for PDF documents based on convolutional neural networks [C], 2016, Santorini, Greece, IEEE, 287292

AI Summary AI Mindmap
PDF

208

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/