Local structured representation for generic object detection

Junge ZHANG; Kaiqi HUANG; Tieniu TAN; Zhaoxiang ZHANG

doi:10.1007/s11704-016-5530-6

PDF(659 KB)

Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (4) : 632-648. DOI: 10.1007/s11704-016-5530-6

Computer Vision and Pattern Recognition - RESEARCH ARTICLE

Local structured representation for generic object detection

Junge ZHANG¹^,³ ,
Kaiqi HUANG¹^,²^,³ ,
Tieniu TAN¹^,²^,³ ,
Zhaoxiang ZHANG²^,³

Author information +

History +

Abstract

Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of structure modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we propose Local Structured Descriptor to capture the object’s local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local structured model with a boosted feature selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.

Keywords

Local Structured Descriptor / Local Structured Model / Object Representation / Object Structure / Object Detection / PASCAL VOC

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Junge ZHANG, Kaiqi HUANG, Tieniu TAN, Zhaoxiang ZHANG. Local structured representation for generic object detection. Front. Comput. Sci., 2017, 11(4): 632‒648 https://doi.org/10.1007/s11704-016-5530-6

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	AlexeB, Deselaers T, FerrariV . Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11): 2189–2202 CrossRef Google scholar

[2]	ChengM M, ZhangZ, LinW Y, Torr P. Bing: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3286–3293 CrossRef Google scholar

[3]	ZitnickC, Dollár P. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Computer Vision. 2014, 391–405 CrossRef Google scholar

[4]	YaoC, BaiX, LiuW. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23(11): 4737–4749 CrossRef Google scholar

[5]	ZhuY, YaoC, BaiX. Scene text detection and recognition: recent advances and future trends. Frontiers of Computer Science, 2016, 10(1): 19–36 CrossRef Google scholar

[6]	DalalN, TriggsB. Histograms of oriented gradients for human detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2005, 886–893 CrossRef Google scholar

[7]	VedaldiA, Gulshan V, VarmaM , ZissermanA. Multiple kernels for object detection. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 606–613 CrossRef Google scholar

[8]	WangX, HanT X, YanS. An HOG-LBP human detector with partial occlusion handling. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 32–39 CrossRef Google scholar

[9]	FelzenszwalbP, Girshick R, McAllesterD , RamananD. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627–1645 CrossRef Google scholar

[10]	FergusR, PeronaP, ZissermanA. Object class recognition by unsupervised scale-invariant learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2003, 264–271 CrossRef Google scholar

[11]	SchnitzspanP, RothS, SchieleB. Automatic discovery of meaningful object parts with latent CRFs. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 121–128 CrossRef Google scholar

[12]	YangY, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1385–1392 CrossRef Google scholar

[13]	ZhuL, ChenY, YuilleA L, Freeman W T. Latent hierarchical structural learning for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 1062–1069

[14]	FischlerM, Elschlager R. The representation and matching of pictorial structures. IEEE Transactions on Computers, 1973, 22(1): 67–92 CrossRef Google scholar

[15]	OjalaT, Pietikäinen M, HarwoodD . A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 1996, 29(1): 51–59 CrossRef Google scholar

[16]	LoweD G.Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110 CrossRef Google scholar

[17]	MarkE, GoolL, WilliamsC K , WinnJ, Zisserman A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338 CrossRef Google scholar

[18]	ZhangJ, HuangK, YuY, TanT. Boosted local structured HOG-LBP for object localization. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1393–1400 CrossRef Google scholar

[19]	Papageorgiou, C, Poggio T. A trainable system for object detection.International Journal of Computer Vision, 2000, 38(1): 15–33 CrossRef Google scholar

[20]	ViolaP, JonesM J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154 CrossRef Google scholar

[21]	LeeT S. Image representation using 2D gabor wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(10): 959–971 CrossRef Google scholar

[22]	ShechtmanE, IraniM. Matching local self-similarities across images and videos. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2007, 1–8 CrossRef Google scholar

[23]	FerrariV, Fevrier L, JurieF , SchmidC. Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(1): 36–51 CrossRef Google scholar

[24]	BaiX, BaiS, ZhuZ, Latecki L J. 3D shape matching via two layer coding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(12): 2361–2373 CrossRef Google scholar

[25]	LazebnikS, SchmidC, PonceJ. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 2169–2178 CrossRef Google scholar

[26]	SivicJ, Russell B, EfrosA , ZissermanA, Freeman W. Discovering objects and their location in images. In: Proceedings of IEEE International Conference on Computer Vision. 2005, 370–377 CrossRef Google scholar

[27]	FelzenszwalbP F, Huttenlocher D P. Distance transforms of sampled functions. Theory of Computing, 2012, 8(1): 415–428 CrossRef Google scholar

[28]	EsteparR S J. Local Structure tensor for multidimensional signal processing: applications to medical image analysis. Dissertation for the Doctoral Degree. Valladolid:University of Valladolid, 2005

[29]	MorroneC, BurrD. Feature detection in human vision: a phasedependent energy model. In: Proceedings of the Royal Society of London B: Biological Sciences. 1988, 221–245

[30]	VenkateshS, OwensR. On the classification of image features. Pattern Recognition Letters, 1990, 11(5): 339–349 CrossRef Google scholar

[31]	GranlundG H, Knutsson H. Signal Processing for Computer Vision. Dordrecht: Kluwer Academic Publishers, 1995 CrossRef Google scholar

[32]	OlivaA, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145–175 CrossRef Google scholar

[33]	OjalaT, Pietikainen M, MaenpaaT . Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971–987 CrossRef Google scholar

[34]	VarmaM, BabuB R. More generality in efficient multiple kernel learning. In: Proceedings of International Conference onMachine Learning. 2009, 1065–1072 CrossRef Google scholar

[35]	FriedmanJ, HastieT, TibshiraniR . Additive logistic regression: a statistical view of boosting. Annuals of Statistics, 2000, 28(2): 374–376 CrossRef Google scholar

[36]	HussainS, TriggsB. Feature sets and dimensionality reduction for visual object detection. In: Proceedings of British Machine Vision Conference. 2010 CrossRef Google scholar

[37]	FelzenszwalbP F, Girshick R B, McAllesterD . Discriminatively Trained Deformable Part Models, Release 3

[38]	Felzenszwalb, P F, Girshick R B, McAllesterD . Discriminatively Trained Deformable Part Models, Release 4, 2010

[39]	GehlerP, Nowozin S. On feature combination for multiclass object classification. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 221–228 CrossRef Google scholar

[40]	TorralbaA, MurphyK, FreemanW. Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2004, 762–769 CrossRef Google scholar

[41]	EveringhamM, GoolV L, WilliamsC K I , WinnJ, Zisserman A. Empirical analysis of detection cascades of boosted classifiers for rapid object detection. Lecture Notes in Computer Science, 2003, 2781: 297–304 CrossRef Google scholar

[42]	EveringhamM, GoolV L, WilliamsC K I , WinnJ, Zisserman A. The PASCAL visual object classes challenge 2007 (VOC2007) results. International Journal of Computer Vision, 2010, 88(2): 303–338 CrossRef Google scholar

[43]	DesaiC, Ramanan D, FowlkesC . Discriminative models for multiclass object layout. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 229–236

[44]	PedersoliM, Vedaldi A, GonzalezJ . A coarse-to-fine approach for fast deformable object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1353–1360 CrossRef Google scholar

[45]	RazaviN, GallJ, GoolV L. Scalable multi-class object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1505–1512 CrossRef Google scholar

[46]	DivvalaS K, Zitnick C, KapoorA , BakerS. Detecting objects using unsupervised parts-based attributes. Technical Report CMU-RI-TR-11- 10, Robotics Institute. 2010

[47]	SchnitzspanP, FritzM, RothS, Schiele B. Discriminative structure learning of hierarchical representations for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2009, 2238–2245 CrossRef Google scholar

[48]	MalisiewiczT, GuptaA, EfrosA A. Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 89–96 CrossRef Google scholar

[49]	DuboutC, Fleuret F. Deformable part models with individual part scaling. In: Proceedings of the British Machine Vision Conference. 2013 CrossRef Google scholar

[50]	GidarisS, Komodakis N. Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1134–1142 CrossRef Google scholar

[51]	GirshickR. Fast r-cnn. 2015, arXiv:1504.08083

[52]	GirshickR, Donahue J, DarrellT , MalikJ. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2014, 580–587 CrossRef Google scholar

[53]	HeK, ZhangX, RenS, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of European Conference on Computer Vision. 2014, 346–361 CrossRef Google scholar

[54]	LiangX, LiuS, WeiY, Liu L, LinL , YanS. Computational baby learning. 2014, arXiv:1411.2861

[55]	RenS, HeK, GirshickR, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. 2015, arXiv:1506.01497

[56]	RenS, HeK, GirshickR, Zhang X, SunJ . Object detection networks on convolutional feature maps. 2015, arXiv:1504.06066

[57]	RenW, HuangK, Tao D, Tan T. Weakly supervised large scale object localization with multiple instance learning and bag splitting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 32(2): 405–416 CrossRef Google scholar

[58]	WanL, EigenD, FergusR. End-to-end integration of a convolutional network, deformable parts model and non-maximum suppression. 2014, arXiv:1411.5309

[59]	WangC, HuangK, RenW, Zhang J, MaybankS . Large-scale weakly supervised object localization via latent category learning. IEEE Transactions on Image Processing, 2015, 24(4): 1371–1385 CrossRef Google scholar

[60]	ZhangY, SohnK, VillegasR, Pan G, LeeH . Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. 2015, arXiv:1504.03293

[61]	ZhuY, Urtasun R, SalakhutdinovR , FidlerS. segDeepM: exploiting segmentation and context in deep neural networks for object detection. 2015, arXiv:1502.04275

[62]	SongX, WuT, JiaY, Zhu S C. Discriminatively trained and-or tree models for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 23–28 CrossRef Google scholar

[63]	WangX, LinL, HuangL, Yan S. Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 3334–3341 CrossRef Google scholar

[64]	MarkE, GoolV L, WilliamsC K I , WinnJ, Zisserman A. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. Technical Report. 2008

[65]	ChenY, ZhuL, YuilleA. Active mask hierarchies for object detection. In: Proceedings of European Conference on Computer Vision. 2010, 43–56 CrossRef Google scholar

[66]	OttP, Everingham M. Shared parts for deformable part-based models. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1513–1520 CrossRef Google scholar

[67]	ZhangJ, HuangY, HuangK, Wu Z, TanT . Data decomposition and spatial mixture modeling for part based model. In: Proceedings of Asian Conference on Computer Vision. 2012, 123–137