Aug 2018, Volume 19 Issue 6
    

  • Select all
  • Orginal Article
    Divya PANDOVE, Shivani GOEL, Rinkle RANI

    Correlation analysis is an effective mechanism for studying patterns in data and making predictions. Many interesting discoveries have been made by formulating correlations in seemingly unrelated data. We propose an algorithm to quantify the theory of correlations and to give an intuitive, more accurate correlation coefficient. We propose a predictive metric to calculate correlations between paired values, known as the general rank-based correlation coefficient. It fulfills the five basic criteria of a predictive metric: independence from sample size, value between 1 and 1, measuring the degree of monotonicity, insensitivity to outliers, and intuitive demonstration. Furthermore, the metric has been validated by performing experiments using a real-time dataset and random number simulations. Mathematical derivations of the proposed equations have also been provided. We have compared it to Spearman’s rank correlation coefficient. The comparison results show that the proposed metric fares better than the existing metric on all the predictive metric criteria.

  • Orginal Article
    Ahmad FIRDAUS, Nor Badrul ANUAR, Ahmad KARIM, Mohd Faizal Ab RAZAK

    Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use Android for their daily routines, including important communications. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search (GS), which is a search based on a genetic algorithm (GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Naïve Bayes (NB), functional trees (FT), J48, random forest (RF), and multilayer perceptron (MLP). Among these classifiers, FT gave the highest accuracy (95%) and true positive rate (TPR) (96.7%) with the use of only six features.

  • Orginal Article
    Deng CHEN, Yan-duo ZHANG, Wei WEI, Rong-cun WANG, Xiao-lin LI, Wei LIU, Shi-xun WANG, Rui ZHU

    Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data (object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios (OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpecMiner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.

  • Orginal Article
    Qiang LAN, Lin-bo QIAO, Yi-jie WANG

    In this study, we propose and compare stochastic variants of the extra-gradient alternating direction method, named the stochastic extra-gradient alternating direction method with Lagrangian function (SEGL) and the stochastic extra-gradient alternating direction method with augmented Lagrangian function (SEGAL), to minimize the graph-guided optimization problems, which are composited with two convex objective functions in large scale. A number of important applications in machine learning follow the graph-guided optimization formulation, such as linear regression, logistic regression, Lasso, structured extensions of Lasso, and structured regularized logistic regression. We conduct experiments on fused logistic regression and graph-guided regularized regression. Experimental results on several genres of datasets demonstrate that the proposed algorithm outperforms other competing algorithms, and SEGAL has better performance than SEGL in practical use.

  • Orginal Article
    Rabia IRFAN, Sharifullah KHAN, Kashif RAJPOOT, Ali Mustafa QAMAR

    Taxonomy is generated to effectively organize and access large volume of data. A taxonomy is a way of representing concepts that exist in data. It needs to continuously evolve to reflect changes in data. Existing automatic taxonomy generation techniques do not handle the evolution of data; therefore, the generated taxonomies do not truly represent the data. The evolution of data can be handled by either regenerating taxonomy from scratch, or allowing taxonomy to incrementally evolve whenever changes occur in the data. The former approach is not economical in terms of time and resources. A taxonomy incremental evolution (TIE) algorithm, as proposed, is a novel attempt to handle the data that evolve in time. It serves as a layer over an existing clustering-based taxonomy generation technique and allows an existing taxonomy to incrementally evolve. The algorithm was evaluated in research articles selected from the computing domain. It was found that the taxonomy using the algorithm that evolved with data needed considerably shorter time, and had better quality per unit time as compared to the taxonomy regenerated from scratch.

  • Orginal Article
    Guo-peng XU, Hai-tang LU, Fei-fei ZHANG, Qi-rong MAO

    In dimensional affect recognition, the machine learning methods, which are used to model and predict affect, are mostly classification and regression. However, the annotation in the dimensional affect space usually takes the form of a continuous real value which has an ordinal property. The aforementioned methods do not focus on taking advantage of this important information. Therefore, we propose an affective rating ranking framework for affect recognition based on face images in the valence and arousal dimensional space. Our approach can appropriately use the ordinal information among affective ratings which are generated by discretizing continuous annotations. Specifically, we first train a series of basic cost-sensitive binary classifiers, each of which uses all samples relabeled according to the comparison results between corresponding ratings and a given rank of a binary classifier. We obtain the final affective ratings by aggregating the outputs of binary classifiers. By comparing the experimental results with the baseline and deep learning based classification and regression methods on the benchmarking database of the AVEC 2015 Challenge and the selected subset of SEMAINE database, we find that our ordinal ranking method is effective in both arousal and valence dimensions.

  • Orginal Article
    Zhen-yu LIU, Shi-en ZHOU, Jin CHENG, Chan QIU, Jian-rong TAN

    Assembly variation analysis of parts that have flexible curved surfaces is much more difficult than that of solid bodies, because of structural deformations in the assembly process. Most of the current variation analysis methods either neglect the relationships among feature points on part surfaces or regard the distribution of all feature points as the same. In this study, the problem of flexible curved surface assembly is simplified to the matching of side lines. A methodology based on Bézier curves is proposed to represent the side lines of surfaces. It solves the variation analysis problem of flexible curved surface assembly when considering surface continuity through the relations between control points and data points. The deviations of feature points on side lines are obtained through control point distribution and are then regarded as inputs in commercial finite element analysis software to calculate the final product deformations. Finally, the proposed method is illustrated in two cases of antenna surface assembly.

  • Orginal Article
    Chao FANG, Yang XIANG, Ke-qi QI

    We propose a general method of designing phase-shifting algorithms for grating lateral shearing interferometry. The algorithms compensate for the zeroth-order effect error and phase-shifting error in varying degrees. We derive a general expression of the phase-shifting algorithm in grating lateral shearing interferometer and introduce the corresponding design method. Based on the expression and method, four phase-shifting algorithms are designed with different phase-shifting errors to obtain high measurement accuracy. A new 13-frame phase-shifting algorithm is designed and simulated with a large zeroth-order effect. Simulation results verify the general expression and the corresponding design method.