Frontiers of Computer Science

Computer Vision and Pattern Recognition

Quality article selection in Computer Vision and Pattern Recognition field

Publication years

Loading ...

Article types

Loading ...

Sort by Relevance Newest first Most accessed

Select all

REVIEW ARTICLE

Image categorization with resource constraints: introduction, challenges and advances

Jian-Hao LUO,Wang ZHOU,Jianxin WU

Frontiers of Computer Science, 2017, 11(1): 13-26. https://doi.org/10.1007/s11704-016-5514-6

Download PDF

As one of the most classic fields in computer vision, image categorization has attracted widespread interests. Numerous algorithms have been proposed in the community, and many of them have advanced the state-of-the-art. However, most existing algorithms are designed without consideration for the supply of computing resources. Therefore, when dealing with resource constrained tasks, these algorithms will fail to give satisfactory results. In this paper, we provide a comprehensive and in-depth introduction of recent developments of the research in image categorization with resource constraints. While a large portion is based on our own work, we will also give a brief description of other elegant algorithms. Furthermore, we make an investigation into the recent developments of deep neural networks, with a focus on resource constrained deep nets.
RESEARCH ARTICLE

VIPLFaceNet: an open source deep face recognition SDK

Xin LIU,Meina KAN,Wanglong WU,Shiguang SHAN,Xilin CHEN

Frontiers of Computer Science, 2017, 11(2): 208-218. https://doi.org/10.1007/s11704-016-6076-3

Download PDF

Robust face representation is imperative to highly accurate face recognition. In this work, we propose an open source face recognition method with deep representation named as VIPLFaceNet, which is a 10-layer deep convolutional neural network with seven convolutional layers and three fully-connected layers. Compared with the well-known AlexNet, our VIPLFaceNet takes only 20% training time and 60% testing time, but achieves 40% drop in error rate on the real-world face recognition benchmark LFW. Our VIPLFaceNet achieves 98.60% mean accuracy on LFW using one single network. An open-source C++ SDK based on VIPLFaceNet is released under BSD license. The SDK takes about 150ms to process one face image in a single thread on an i7 desktop CPU. VIPLFaceNet provides a state-of-the-art start point for both academic and industrial face recognition applications.
RESEARCH ARTICLE

Attribute-based supervised deep learning model for action recognition

Kai CHEN,Guiguang DING,Jungong HAN

Frontiers of Computer Science, 2017, 11(2): 219-229. https://doi.org/10.1007/s11704-016-6066-5

Download PDF

Deep learning has been the most popular feature learning method used for a variety of computer vision applications in the past 3 years. Not surprisingly, this technique, especially the convolutional neural networks (ConvNets) structure, is exploited to identify the human actions, achieving great success. Most algorithms in existence directly adopt the basic ConvNets structure, which works pretty well in the ideal situation, e.g., under stable lighting conditions. However, its performance degrades significantly when the intra-variation in relation to image appearance occurs within the same category. To solve this problem, we propose a new method, integrating the semantically meaningful attributes into deep learning’s hierarchical structure. Basically, the idea is to add simple yet effective attributes to the category level of ConvNets such that the attribute information is able to drive the learning procedure. The experimental results based on three popular action recognition databases show that the embedding of auxiliary multiple attributes into the deep learning framework improves the classification accuracy significantly.
RESEARCH ARTICLE

Robust visual tracking based on scale invariance and deep learning

Nan REN,Junping DU,Suguo ZHU,Linghui LI,Dan FAN,JangMyung LEE

Frontiers of Computer Science, 2017, 11(2): 230-242. https://doi.org/10.1007/s11704-016-6050-0

Download PDF

Visual tracking is a popular research area in computer vision, which is very difficult to actualize because of challenges such as changes in scale and illumination, rotation, fast motion, and occlusion. Consequently, the focus in this research area is to make tracking algorithms adapt to these changes, so as to implement stable and accurate visual tracking. This paper proposes a visual tracking algorithm that integrates the scale invariance of SURF feature with deep learning to enhance the tracking robustness when the size of the object to be tracked changes significantly. Particle filter is used for motion estimation. The confidence of each particle is computed via a deep neural network, and the result of particle filter is verified and corrected by mean shift because of its computational efficiency and insensitivity to external interference. Both qualitative and quantitative evaluations on challenging benchmark sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods throughout the challenging factors in visual tracking, especially for scale variation.
RESEARCH ARTICLE

Deep model-based feature extraction for predicting protein subcellular localizations from bio-images

Wei SHAO,Yi DING,Hong-Bin SHEN,Daoqiang ZHANG

Frontiers of Computer Science, 2017, 11(2): 243-252. https://doi.org/10.1007/s11704-017-6538-2

Download PDF

Protein subcellular localization prediction is important for studying the function of proteins. Recently, as significant progress has been witnessed in the field of microscopic imaging, automatically determining the subcellular localization of proteins from bio-images is becoming a new research hotspot. One of the central themes in this field is to determine what features are suitable for describing the protein images. Existing feature extraction methods are usually hand-crafted designed, by which only one layer of features will be extracted, which may not be sufficient to represent the complex protein images. To this end, we propose a deep model based descriptor (DMD) to extract the high-level features from protein images. Specifically, in order to make the extracted features more generic, we firstly trained a convolution neural network (i.e., AlexNet) by using a natural image set with millions of labels, and then used the partial parameter transfer strategy to fine-tune the parameters from natural images to protein images. After that, we applied the Lasso model to select the most distinguishing features from the last fully connected layer of the CNN (Convolution Neural Network), and used these selected features for final classifications. Experimental results on a protein image dataset validate the efficacy of our method.
RESEARCH ARTICLE

Hierarchical deep hashing for image retrieval

Ge SONG,Xiaoyang TAN

Frontiers of Computer Science, 2017, 11(2): 253-265. https://doi.org/10.1007/s11704-017-6537-3

Download PDF

We present a new method to generate efficient multi-level hashing codes for image retrieval based on the deep siamese convolutional neural network (DSCNN). Conventional deep hashing methods trade off the capability of capturing highly complex and nonlinear semantic information of images against very compact hash codes, usually leading to high retrieval efficiency but with deteriorated accuracy. We alleviate the restrictive compactness requirement of hash codes by extending them to a two-level hierarchical coding scheme, in which the first level aims to capture the high-level semantic information extracted by the deep network using a rich encoding strategy, while the subsequent level squeezes them to more global and compact codes. At running time, we adopt an attention-based mechanism to select some of its most essential bits specific to each query image for retrieval instead of using the full hash codes of the first level. The attention-based mechanism is based on the guides of hash codes generated by the second level, taking advantage of both local and global properties of deep features. Experimental results on various popular datasets demonstrate the advantages of the proposed method compared to several state-of-the-art methods.
RESEARCH ARTICLE

Facial expression recognition via weighted group sparsity

Hao ZHENG,Xin GENG

Frontiers of Computer Science, 2017, 11(2): 266-275. https://doi.org/10.1007/s11704-016-5204-4

Download PDF

Considering the distinctiveness of different group features in the sparse representation, a novel joint multitask and weighted group sparsity (JMT-WGS) method is proposed. By weighting popular group sparsity, not only the representation coefficients from the same class over their associate dictionaries may share some similarity, but also the representation coefficients from different classes have enough diversity. The proposed method is cast into a multi-task framework with two-stage iteration. In the first stage, representation coefficient can be optimized by accelerated proximal gradient method when the weights are fixed. In the second stage, the weights are computed via the prior information about their entropy. The experimental results on three facial expression databases show that the proposed algorithm outperforms other state-of-the-art algorithms and demonstrate the promising performance of the proposed algorithm.
RESEARCH ARTICLE

Local structured representation for generic object detection

Junge ZHANG, Kaiqi HUANG, Tieniu TAN, Zhaoxiang ZHANG

Frontiers of Computer Science, 2017, 11(4): 632-648. https://doi.org/10.1007/s11704-016-5530-6

Download PDF

Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of structure modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we propose Local Structured Descriptor to capture the object’s local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local structured model with a boosted feature selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.
RESEARCH ARTICLE

E-GrabCut: an economic method of iterative video object extraction

Le DONG, Ning FENG, Mengdie MAO, Ling HE, Jingjing WANG

Frontiers of Computer Science, 2017, 11(4): 649-660. https://doi.org/10.1007/s11704-016-5558-7

Download PDF

Efficient, interactive foreground/background segmentation in video is of great practical importance in video editing. This paper proposes an interactive and unsupervised video object segmentation algorithm named E-GrabCut concentrating on achieving both of the segmentation quality and time efficiency as highly demanded in the related filed. There are three features in the proposed algorithms. Firstly, we have developed a powerful, non-iterative version of the optimization process for each frame. Secondly, more user interaction in the first frame is used to improve the Gaussian Mixture Model (GMM). Thirdly, a robust algorithm for the following frame segmentation has been developed by reusing the previous GMM. Extensive experiments demonstrate that our method outperforms the state-of-the-art video segmentation algorithm in terms of integration of time efficiency and segmentation quality.
RESEARCH ARTICLE

Color space quantization-based clustering for image retrieval

Le DONG, Wenpu DONG, Ning FENG, Mengdie MAO, Long CHEN, Gaipeng KONG

Frontiers of Computer Science, 2017, 11(6): 1023-1035. https://doi.org/10.1007/s11704-016-5538-y

Download PDF

Color descriptors of an image are the most widely used visual features in content-based image retrieval systems. In this study, we present a novel color-based image retrieval framework by integrating color space quantization and feature coding. Although color features have advantages such as robustness and simple extraction, direct processing of the abundant amount of color information in an RGB image is a challenging task. To overcome this problem, a color space clustering quantization algorithm is proposed to obtain the clustering color space (CCS) by clustering the CIE1976L∗a∗b∗ space into 256 distinct colors, which adequately accommodate human visual perception. In addition, a new feature coding method called feature-to-character coding (FCC) is proposed to encode the block-based main color features into character codes. In this method, images are represented by character codes that contribute to efficiently building an inverted index by using color features and by utilizing text-based search engines. Benefiting from its high-efficiency computation, the proposed framework can also be applied to large-scale web image retrieval. The experimental results demonstrate that the proposed system can produce a significant augmentation in performance when compared to blockbased main color image retrieval systems that utilize the traditional HSV(Hue, Saturation, Value) quantization method.

page
Page 1
of 1
Total 10 records

About the journal

Aims & scope

Description

Editorial board

Abstracting / Indexing

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Collections

Please choose a citation manager