Collections

Computer Vision and Pattern Recognition
  Quality article selection in Computer Vision and Pattern Recognition field
Publication years
Loading ...
Article types
Loading ...
  • Select all
  • REVIEW ARTICLE
    Jian-Hao LUO,Wang ZHOU,Jianxin WU
    Frontiers of Computer Science, 2017, 11(1): 13-26. https://doi.org/10.1007/s11704-016-5514-6

    As one of the most classic fields in computer vision, image categorization has attracted widespread interests. Numerous algorithms have been proposed in the community, and many of them have advanced the state-of-the-art. However, most existing algorithms are designed without consideration for the supply of computing resources. Therefore, when dealing with resource constrained tasks, these algorithms will fail to give satisfactory results. In this paper, we provide a comprehensive and in-depth introduction of recent developments of the research in image categorization with resource constraints. While a large portion is based on our own work, we will also give a brief description of other elegant algorithms. Furthermore, we make an investigation into the recent developments of deep neural networks, with a focus on resource constrained deep nets.

  • RESEARCH ARTICLE
    Xin LIU,Meina KAN,Wanglong WU,Shiguang SHAN,Xilin CHEN
    Frontiers of Computer Science, 2017, 11(2): 208-218. https://doi.org/10.1007/s11704-016-6076-3

    Robust face representation is imperative to highly accurate face recognition. In this work, we propose an open source face recognition method with deep representation named as VIPLFaceNet, which is a 10-layer deep convolutional neural network with seven convolutional layers and three fully-connected layers. Compared with the well-known AlexNet, our VIPLFaceNet takes only 20% training time and 60% testing time, but achieves 40% drop in error rate on the real-world face recognition benchmark LFW. Our VIPLFaceNet achieves 98.60% mean accuracy on LFW using one single network. An open-source C++ SDK based on VIPLFaceNet is released under BSD license. The SDK takes about 150ms to process one face image in a single thread on an i7 desktop CPU. VIPLFaceNet provides a state-of-the-art start point for both academic and industrial face recognition applications.

  • RESEARCH ARTICLE
    Kai CHEN,Guiguang DING,Jungong HAN
    Frontiers of Computer Science, 2017, 11(2): 219-229. https://doi.org/10.1007/s11704-016-6066-5

    Deep learning has been the most popular feature learning method used for a variety of computer vision applications in the past 3 years. Not surprisingly, this technique, especially the convolutional neural networks (ConvNets) structure, is exploited to identify the human actions, achieving great success. Most algorithms in existence directly adopt the basic ConvNets structure, which works pretty well in the ideal situation, e.g., under stable lighting conditions. However, its performance degrades significantly when the intra-variation in relation to image appearance occurs within the same category. To solve this problem, we propose a new method, integrating the semantically meaningful attributes into deep learning’s hierarchical structure. Basically, the idea is to add simple yet effective attributes to the category level of ConvNets such that the attribute information is able to drive the learning procedure. The experimental results based on three popular action recognition databases show that the embedding of auxiliary multiple attributes into the deep learning framework improves the classification accuracy significantly.

  • RESEARCH ARTICLE
    Nan REN,Junping DU,Suguo ZHU,Linghui LI,Dan FAN,JangMyung LEE
    Frontiers of Computer Science, 2017, 11(2): 230-242. https://doi.org/10.1007/s11704-016-6050-0

    Visual tracking is a popular research area in computer vision, which is very difficult to actualize because of challenges such as changes in scale and illumination, rotation, fast motion, and occlusion. Consequently, the focus in this research area is to make tracking algorithms adapt to these changes, so as to implement stable and accurate visual tracking. This paper proposes a visual tracking algorithm that integrates the scale invariance of SURF feature with deep learning to enhance the tracking robustness when the size of the object to be tracked changes significantly. Particle filter is used for motion estimation. The confidence of each particle is computed via a deep neural network, and the result of particle filter is verified and corrected by mean shift because of its computational efficiency and insensitivity to external interference. Both qualitative and quantitative evaluations on challenging benchmark sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods throughout the challenging factors in visual tracking, especially for scale variation.

  • RESEARCH ARTICLE
    Wei SHAO,Yi DING,Hong-Bin SHEN,Daoqiang ZHANG
    Frontiers of Computer Science, 2017, 11(2): 243-252. https://doi.org/10.1007/s11704-017-6538-2

    Protein subcellular localization prediction is important for studying the function of proteins. Recently, as significant progress has been witnessed in the field of microscopic imaging, automatically determining the subcellular localization of proteins from bio-images is becoming a new research hotspot. One of the central themes in this field is to determine what features are suitable for describing the protein images. Existing feature extraction methods are usually hand-crafted designed, by which only one layer of features will be extracted, which may not be sufficient to represent the complex protein images. To this end, we propose a deep model based descriptor (DMD) to extract the high-level features from protein images. Specifically, in order to make the extracted features more generic, we firstly trained a convolution neural network (i.e., AlexNet) by using a natural image set with millions of labels, and then used the partial parameter transfer strategy to fine-tune the parameters from natural images to protein images. After that, we applied the Lasso model to select the most distinguishing features from the last fully connected layer of the CNN (Convolution Neural Network), and used these selected features for final classifications. Experimental results on a protein image dataset validate the efficacy of our method.

  • RESEARCH ARTICLE
    Ge SONG,Xiaoyang TAN
    Frontiers of Computer Science, 2017, 11(2): 253-265. https://doi.org/10.1007/s11704-017-6537-3

    We present a new method to generate efficient multi-level hashing codes for image retrieval based on the deep siamese convolutional neural network (DSCNN). Conventional deep hashing methods trade off the capability of capturing highly complex and nonlinear semantic information of images against very compact hash codes, usually leading to high retrieval efficiency but with deteriorated accuracy. We alleviate the restrictive compactness requirement of hash codes by extending them to a two-level hierarchical coding scheme, in which the first level aims to capture the high-level semantic information extracted by the deep network using a rich encoding strategy, while the subsequent level squeezes them to more global and compact codes. At running time, we adopt an attention-based mechanism to select some of its most essential bits specific to each query image for retrieval instead of using the full hash codes of the first level. The attention-based mechanism is based on the guides of hash codes generated by the second level, taking advantage of both local and global properties of deep features. Experimental results on various popular datasets demonstrate the advantages of the proposed method compared to several state-of-the-art methods.

  • RESEARCH ARTICLE
    Hao ZHENG,Xin GENG
    Frontiers of Computer Science, 2017, 11(2): 266-275. https://doi.org/10.1007/s11704-016-5204-4

    Considering the distinctiveness of different group features in the sparse representation, a novel joint multitask and weighted group sparsity (JMT-WGS) method is proposed. By weighting popular group sparsity, not only the representation coefficients from the same class over their associate dictionaries may share some similarity, but also the representation coefficients from different classes have enough diversity. The proposed method is cast into a multi-task framework with two-stage iteration. In the first stage, representation coefficient can be optimized by accelerated proximal gradient method when the weights are fixed. In the second stage, the weights are computed via the prior information about their entropy. The experimental results on three facial expression databases show that the proposed algorithm outperforms other state-of-the-art algorithms and demonstrate the promising performance of the proposed algorithm.

  • RESEARCH ARTICLE
    Junge ZHANG, Kaiqi HUANG, Tieniu TAN, Zhaoxiang ZHANG
    Frontiers of Computer Science, 2017, 11(4): 632-648. https://doi.org/10.1007/s11704-016-5530-6

    Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of structure modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we propose Local Structured Descriptor to capture the object’s local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local structured model with a boosted feature selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.

  • RESEARCH ARTICLE
    Le DONG, Ning FENG, Mengdie MAO, Ling HE, Jingjing WANG
    Frontiers of Computer Science, 2017, 11(4): 649-660. https://doi.org/10.1007/s11704-016-5558-7

    Efficient, interactive foreground/background segmentation in video is of great practical importance in video editing. This paper proposes an interactive and unsupervised video object segmentation algorithm named E-GrabCut concentrating on achieving both of the segmentation quality and time efficiency as highly demanded in the related filed. There are three features in the proposed algorithms. Firstly, we have developed a powerful, non-iterative version of the optimization process for each frame. Secondly, more user interaction in the first frame is used to improve the Gaussian Mixture Model (GMM). Thirdly, a robust algorithm for the following frame segmentation has been developed by reusing the previous GMM. Extensive experiments demonstrate that our method outperforms the state-of-the-art video segmentation algorithm in terms of integration of time efficiency and segmentation quality.

  • RESEARCH ARTICLE
    Le DONG, Wenpu DONG, Ning FENG, Mengdie MAO, Long CHEN, Gaipeng KONG
    Frontiers of Computer Science, 2017, 11(6): 1023-1035. https://doi.org/10.1007/s11704-016-5538-y

    Color descriptors of an image are the most widely used visual features in content-based image retrieval systems. In this study, we present a novel color-based image retrieval framework by integrating color space quantization and feature coding. Although color features have advantages such as robustness and simple extraction, direct processing of the abundant amount of color information in an RGB image is a challenging task. To overcome this problem, a color space clustering quantization algorithm is proposed to obtain the clustering color space (CCS) by clustering the CIE1976L∗a∗b∗ space into 256 distinct colors, which adequately accommodate human visual perception. In addition, a new feature coding method called feature-to-character coding (FCC) is proposed to encode the block-based main color features into character codes. In this method, images are represented by character codes that contribute to efficiently building an inverted index by using color features and by utilizing text-based search engines. Benefiting from its high-efficiency computation, the proposed framework can also be applied to large-scale web image retrieval. The experimental results demonstrate that the proposed system can produce a significant augmentation in performance when compared to blockbased main color image retrieval systems that utilize the traditional HSV(Hue, Saturation, Value) quantization method.