The prior knowledge is the significant supplement to image-based 3D modeling algorithms for refining the fragile consistency-based stereo. In this paper, we review the image-based 3D modeling problem according to prior categories, i.e., classical priors and specific priors. The classical priors including smoothness, silhouette and illumination are well studied for improving the accuracy and robustness of the 3D reconstruction. In recent years, various specific priors which take advantage of Manhattan rule, geometry template and trained category features have been proposed to enhance the modeling performance. The advantages and limitations of both kinds of priors are discussed and evaluated in the paper. Finally, we discuss the trend and challenges of the prior studies in the future.
A wide variety of predictive analytics techniques have been developed in statistics, machine learning and data mining; however, many of these algorithms take a black-box approach in which data is input and future predictions are output with no insight into what goes on during the process. Unfortunately, such a closed system approach often leaves little room for injecting domain expertise and can result in frustration from analysts when results seem spurious or confusing. In order to allow for more human-centric approaches, the visualization community has begun developing methods to enable users to incorporate expert knowledge into the prediction process at all stages, including data cleaning, feature selection, model building and model validation. This paper surveys current progress and trends in predictive visual analytics, identifies the common framework in which predictive visual analytics systems operate, and develops a summarization of the predictive analytics workflow.
Robust face representation is imperative to highly accurate face recognition. In this work, we propose an open source face recognition method with deep representation named as VIPLFaceNet, which is a 10-layer deep convolutional neural network with seven convolutional layers and three fully-connected layers. Compared with the well-known AlexNet, our VIPLFaceNet takes only 20% training time and 60% testing time, but achieves 40% drop in error rate on the real-world face recognition benchmark LFW. Our VIPLFaceNet achieves 98.60% mean accuracy on LFW using one single network. An open-source C++ SDK based on VIPLFaceNet is released under BSD license. The SDK takes about 150ms to process one face image in a single thread on an i7 desktop CPU. VIPLFaceNet provides a state-of-the-art start point for both academic and industrial face recognition applications.
Deep learning has been the most popular feature learning method used for a variety of computer vision applications in the past 3 years. Not surprisingly, this technique, especially the convolutional neural networks (ConvNets) structure, is exploited to identify the human actions, achieving great success. Most algorithms in existence directly adopt the basic ConvNets structure, which works pretty well in the ideal situation, e.g., under stable lighting conditions. However, its performance degrades significantly when the intra-variation in relation to image appearance occurs within the same category. To solve this problem, we propose a new method, integrating the semantically meaningful attributes into deep learning’s hierarchical structure. Basically, the idea is to add simple yet effective attributes to the category level of ConvNets such that the attribute information is able to drive the learning procedure. The experimental results based on three popular action recognition databases show that the embedding of auxiliary multiple attributes into the deep learning framework improves the classification accuracy significantly.
Visual tracking is a popular research area in computer vision, which is very difficult to actualize because of challenges such as changes in scale and illumination, rotation, fast motion, and occlusion. Consequently, the focus in this research area is to make tracking algorithms adapt to these changes, so as to implement stable and accurate visual tracking. This paper proposes a visual tracking algorithm that integrates the scale invariance of SURF feature with deep learning to enhance the tracking robustness when the size of the object to be tracked changes significantly. Particle filter is used for motion estimation. The confidence of each particle is computed via a deep neural network, and the result of particle filter is verified and corrected by mean shift because of its computational efficiency and insensitivity to external interference. Both qualitative and quantitative evaluations on challenging benchmark sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods throughout the challenging factors in visual tracking, especially for scale variation.
Protein subcellular localization prediction is important for studying the function of proteins. Recently, as significant progress has been witnessed in the field of microscopic imaging, automatically determining the subcellular localization of proteins from bio-images is becoming a new research hotspot. One of the central themes in this field is to determine what features are suitable for describing the protein images. Existing feature extraction methods are usually hand-crafted designed, by which only one layer of features will be extracted, which may not be sufficient to represent the complex protein images. To this end, we propose a deep model based descriptor (DMD) to extract the high-level features from protein images. Specifically, in order to make the extracted features more generic, we firstly trained a convolution neural network (i.e., AlexNet) by using a natural image set with millions of labels, and then used the partial parameter transfer strategy to fine-tune the parameters from natural images to protein images. After that, we applied the Lasso model to select the most distinguishing features from the last fully connected layer of the CNN (Convolution Neural Network), and used these selected features for final classifications. Experimental results on a protein image dataset validate the efficacy of our method.
We present a new method to generate efficient multi-level hashing codes for image retrieval based on the deep siamese convolutional neural network (DSCNN). Conventional deep hashing methods trade off the capability of capturing highly complex and nonlinear semantic information of images against very compact hash codes, usually leading to high retrieval efficiency but with deteriorated accuracy. We alleviate the restrictive compactness requirement of hash codes by extending them to a two-level hierarchical coding scheme, in which the first level aims to capture the high-level semantic information extracted by the deep network using a rich encoding strategy, while the subsequent level squeezes them to more global and compact codes. At running time, we adopt an attention-based mechanism to select some of its most essential bits specific to each query image for retrieval instead of using the full hash codes of the first level. The attention-based mechanism is based on the guides of hash codes generated by the second level, taking advantage of both local and global properties of deep features. Experimental results on various popular datasets demonstrate the advantages of the proposed method compared to several state-of-the-art methods.
Considering the distinctiveness of different group features in the sparse representation, a novel joint multitask and weighted group sparsity (JMT-WGS) method is proposed. By weighting popular group sparsity, not only the representation coefficients from the same class over their associate dictionaries may share some similarity, but also the representation coefficients from different classes have enough diversity. The proposed method is cast into a multi-task framework with two-stage iteration. In the first stage, representation coefficient can be optimized by accelerated proximal gradient method when the weights are fixed. In the second stage, the weights are computed via the prior information about their entropy. The experimental results on three facial expression databases show that the proposed algorithm outperforms other state-of-the-art algorithms and demonstrate the promising performance of the proposed algorithm.
Robot recognition tasks usually require multiple homogeneous or heterogeneous sensors which intrinsically generate sequential, redundant, and storage demanding data with various noise pollution. Thus, online machine learning algorithms performing efficient sensory feature fusion have become a hot topic in robot recognition domain. This paper proposes an online multi-kernel extreme learning machine (OM-ELM) which assembles multiple ELM classifiers and optimizes the kernel weights with a p-norm formulation of multi-kernel learning (MKL) problem. It can be applied in feature fusion applications that require incremental learning over multiple sequential sensory readings. The performance of OM-ELM is tested towards four different robot recognition tasks. By comparing to several state-of-the-art online models for multi-kernel learning, we claim that our method achieves a superior or equivalent training accuracy and generalization ability with less training time. Practical suggestions are also given to aid effective online fusion of robot sensory features.
Smartphone applications (apps) are becoming increasingly popular all over the world, particularly in the Chinese Generation Y population; however, surprisingly, only a small number of studies on app factors valued by this important group have been conducted. Because the competition among app developers is increasing, app factors that attract users’ attention are worth studying for sales promotion. This paper examines these factors through two separate studies. In the first study, i.e., Experiment 1, which consists of a survey, perceptual rating and verbal protocol methods are employed, and 90 randomly selected app websites are rated by 169 experienced smartphone users according to app attraction. Twelve of the most rated apps (six highest rated and six lowest rated) are selected for further investigation, and 11 influential factors that Generation Y members value are listed. A second study, i.e., Experiment 2, is conducted using the most and least rated app websites from Experiment 1, and eye tracking and verbal protocol methods are used. The eye movements of 45 participants are tracked while browsing these websites, providing evidence about what attracts these users’ attention and the order in which the app components are viewed. The results of these two studies suggest that Chinese Generation Y is a content-centric group when they browse the smartphone app marketplace. Icon, screenshot, price, rating, and name are the dominant and indispensable factors that influence purchase intentions, among which icon and screenshot should be meticulously designed. Price is another key factor that drives Chinese Generation Y’s attention. The recommended apps are the least dominant element. Design suggestions for app websites are also proposed. This research has important implications.
String similarity join is an essential operation of many applications that need to find all similar string pairs from two given collections. A quantitative way to determine whether two strings are similar is to compute their similarity based on a certain similarity function. The string pairs with similarity above a certain threshold are regarded as results. The current approach to solving the similarity join problem is to use a unique threshold value. There are, however, several scenarios that require the support of multiple thresholds, for instance, when the dataset includes strings of various lengths. In this scenario, longer string pairs typically tolerate much more typos than shorter ones. Therefore, we proposed a solution for string similarity joins that supports different similarity thresholds in a single operator. In order to support different thresholds, we devised two novel indexing techniques: partition based indexing and similarity aware indexing. To utilize the new indices and improve the join performance, we proposed new filtering methods and index probing techniques. To the best of our knowledge, this is the first work that addresses this problem. Experimental results on real-world datasets show that our solution performs efficiently while providing a more flexible threshold specification.
The number of constraints imposed on the surface, the light source, the camera model and in particular the initial information makes shape from shading (SFS) very difficult for real applications. There are a considerable number of approaches which require an initial data about the 3D object such as boundary conditions (BC). However, it is difficult to obtain these information for each point of the object Edge in the image, thus the application of these approaches is limited. This paper shows an improvement of the Global View method proposed by Zhu and Shi [
In this paper we propose an optimization framework for interior carving of 3D fabricated shapes. Interior carving is an important technique widely used in industrial and artistic designs to achieve functional purposes by hollowing interior shapes in objects. We formulate such functional purpose as the objective function of an optimization problem whose solution indicates the optimal interior shape. In contrast to previous volumetric methods, we directly represent the boundary of the interior shape as a triangular mesh. We use Eulerian semiderivative to relate the time derivative of the object function to a virtual velocity field and iteratively evolve the interior shape guided by the velocity field with surface tracking. In each iteration, we compute the velocity field guaranteeing the decrease of objective function by solving a linear programming problem. We demonstrate this general framework in a novel application of designing objects floating in fluid and two previously investigated applications, and print various optimized objects to verify its effectiveness.
3D printing has become a promising technique for industry production. This paper presents a research on the manufacturability optimization of discrete products under the influence of 3D printing technology. For this, we first model the problem using a tree structure, and then formulate it as a linear integer programming, where the total production time is to be minimized with the production cost constraint. To solve the problem, a differential evolution (DE) algorithm is developed, which automatically determines whether traditional manufacturing methods or 3D printing technology should be used for each part of the production. The algorithm is further quantitatively evaluated on a synthetic dataset, compared with the exhaustive search and alternating optimization solutions. Simulation results show that the proposed algorithm can well combine the traditional manufacturing methods and 3D printing technology in production, which is helpful to attain optimized product design and process planning concerning manufacture time. Therefore, it is beneficial to provide reference of the widely application and further industrialization of the 3D printing technology.