Aug 2019, Volume 13 Issue 4
    

  • Select all
  • REVIEW ARTICLE
    Yu-Feng LI, De-Ming LIANG

    Semi-supervised learning constructs the predictive model by learning from a few labeled training examples and a large pool of unlabeled ones. It has a wide range of application scenarios and has attracted much attention in the past decades. However, it is noteworthy that although the learning performance is expected to be improved by exploiting unlabeled data, some empirical studies show that there are situations where the use of unlabeled data may degenerate the performance. Thus, it is advisable to be able to exploit unlabeled data safely. This article reviews some research progress of safe semi-supervised learning, focusing on three types of safeness issue: data quality, where the training data is risky or of low-quality;model uncertainty, where the learning algorithm fails to handle the uncertainty during training; measure diversity, where the safe performance could be adapted to diverse measures.

  • RESEARCH ARTICLE
    Thierry GAUTIER, Clément GUY, Alexandre HONORAT, Paul LE GUERNIC, Jean-Pierre TALPIN, Loïc BESNARD

    This paper investigates how state diagrams can be best represented in the polychronous model of computation (MoC) and proposes to use this model for code validation of behavior specifications in architecture analysis & design language (AADL). In this relational MoC, the basic objects are signals, which are related through dataflow equations. Signals are associated with logical clocks, which provide the capability to describe systems in which components obey multiple clock rates. We propose a model of finite-state automata, called polychronous automata, which is based on clock relationships. A specificity of this model is that an automaton is submitted to clock constraints, which allows one to specify a wide range of control-related configurations, being either reactive or restrictive with respect to their control environment. A semantic model is defined for these polychronous automata, which relies on boolean algebra of clocks. Based on a previously defined modeling method for AADL software architectures using the polychronous MoC, the proposed model is used as a formal model for the AADL behavior annex. This is illustrated with a case study involving an adaptive cruise control system.

  • RESEARCH ARTICLE
    Kai HU, Zhangbo DUAN, Jiye WANG, Lingchao GAO, Lihong SHANG

    Embedded real-time systems employ a variety of operating system platforms. Consequently, for automatic code generation, considerable redevelopment is needed when the platform changes. This results in major challenges with respect to the automatic code generation process of the architecture analysis and design language (AADL). In this paper, we propose a method of template-based automatic code generation to address this issue. Templates are used as carriers of automatic code generation rules from AADL to the object platform. These templates can be easily modified for different platforms. Automatic code generation for different platforms can be accomplished by formulating the corresponding generation rules and transformation templates. We design a set of code generation templates from AADL to the object platform and develop an automatic code generation tool. Finally, we take a typical data processing unit (DPU) system as a case study to test the tool. It is demonstrated that the autogenerated codes can be compiled and executed successfully on the object platform.

  • RESEARCH ARTICLE
    Zhibin YANG, Jean-Paul BODEVEIX, Mamoun FILALI

    This paper presents a simple and safe compiler, called MinSIGNAL, from a subset of the synchronous dataflow language SIGNAL to C, as well as its existing enhancements. The compiler follows a modular architecture, and can be seen as a sequence of source-to-source transformations applied to an intermediate representation which is named Synchronous Clocked Guarded Actions (S-CGA) and translation to sequential imperative code. Objective Caml (OCaml) is used for the implementation of MinSIGNAL. As a modern functional language, OCaml is adapted to symbolic computation and so, particularly suitable for compiler design and implementation of formal analysis tools. In particular, the safety of its type checking allows to skip some verification that would be mandatory with other languages. Additionally, this work is a basis for the formal verification of the compilation of SIGNAL with a theorem prover such as Coq.

  • RESEARCH ARTICLE
    Farid FEYZI, Saeed PARSA

    In this paper, a novel approach, Inforence, is proposed to isolate the suspicious codes that likely contain faults. Inforence employs a feature selection method, based on mutual information, to identify those bug-related statements that may cause the program to fail. Because the majority of a program faults may be revealed as undesired joint effect of the program statements on each other and on program termination state, unlike the state-of-the-art methods, Inforence tries to identify and select groups of interdependent statements which altogether may affect the program failure. The interdependence amongst the statements is measured according to their mutual effect on each other and on the program termination state. To provide the context of failure, the selected bug-related statements are chained to each other, considering the program static structure. Eventually, the resultant causeeffect chains are ranked according to their combined causal effect on program failure. To validate Inforence, the results of our experimentswith seven sets of programs include Siemens suite, gzip, grep, sed, space, make and bash are presented. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multi-fault programs. The experimental results prove the outperformance of the proposed method compared to the state-of-the-art techniques.

  • RESEARCH ARTICLE
    Tao ZHU, Huiqi HU, Weining QIAN, Huan ZHOU, Aoying ZHOU

    Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in-writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk structures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in-writing part of the LSM-tree; a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers.We further prove the technique is capable of working robustly when the LSM-Tree is re-organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods.

  • RESEARCH ARTICLE
    Huaizu JIANG, Ming-Ming CHENG, Shi-Jie LI, Ali BORJI, Jingdong WANG

    Recent advances in supervised salient object detection modeling has resulted in significant performance improvements on benchmark datasets. However, most of the existing salient object detection models assume that at least one salient object exists in the input image. Such an assumption often leads to less appealing saliencymaps on the background images with no salient object at all. Therefore, handling those cases can reduce the false positive rate of a model. In this paper, we propose a supervised learning approach for jointly addressing the salient object detection and existence prediction problems. Given a set of background-only images and images with salient objects, as well as their salient object annotations, we adopt the structural SVM framework and formulate the two problems jointly in a single integrated objective function: saliency labels of superpixels are involved in a classification term conditioned on the salient object existence variable, which in turn depends on both global image and regional saliency features and saliency labels assignments. The loss function also considers both image-level and regionlevel mis-classifications. Extensive evaluation on benchmark datasets validate the effectiveness of our proposed joint approach compared to the baseline and state-of-the-art models.

  • RESEARCH ARTICLE
    Jie ZHANG, Xiaowei ZHAO, Meina KAN, Shiguang SHAN, Xiujuan CHAI, Xilin CHEN

    Although the conventional active appearance model (AAM) has achieved some success for face alignment, it still suffers from the generalization problem when be applied to unseen subjects and images. To deal with the generalization problem of AAM, we first reformulate the original AAM as sparsity-regularized AAM, which can achieve more compact/better shape and appearance priors by selecting nearest neighbors as the bases of the shape and appearance model. To speed up the fitting procedure, the sparsity in sparsity-regularized AAM is approximated by using the locality (i.e., K-nearest neighbor), and thus inducing the locality-constrained active appearancemodel (LC-AAM). The LC-AAM solves a constrained AAM-like fitting problem with the K-nearest neighbors as the bases of shape and appearance model. To alleviate the adverse influence of inaccurate K-nearest neighbor results, the locality constraint is further embedded in the discriminative fitting method denoted as LC-DFM, which can find better K-nearest neighbor results by employing shape-indexed feature, and can also tolerate some inaccurate neighbors benefited from the regression model rather than the generative model in AAM. Extensive experiments on several datasets demonstrate that our methods outperform the state-of-the-arts in both detection accuracy and generalization ability.

  • RESEARCH ARTICLE
    Ratha PECH, Dong HAO, Hong CHENG, Tao ZHOU

    In high dimensional data, many dimensions are irrelevant to each other and clusters are usually hidden under noise. As an important extension of the traditional clustering, subspace clustering can be utilized to simultaneously cluster the high dimensional data into several subspaces and associate the low-dimensional subspaces with the corresponding points. In subspace clustering, it is a crucial step to construct an affinity matrix with block-diagonal form, in which the blocks correspond to different clusters. The distance-based methods and the representation-based methods are two major types of approaches for building an informative affinity matrix. In general, it is the difference between the density inside and outside the blocks that determines the efficiency and accuracy of the clustering. In this work, we introduce a well-known approach in statistic physics method, namely link prediction, to enhance subspace clustering by reinforcing the affinity matrix.More importantly,we introduce the idea to combine complex network theory with machine learning. By revealing the hidden links inside each block, we maximize the density of each block along the diagonal, while restrain the remaining non-blocks in the affinity matrix as sparse as possible. Our method has been shown to have a remarkably improved clustering accuracy comparing with the existing methods on well-known datasets.

  • RESEARCH ARTICLE
    Hui XUE, Sen LI, Xiaohong CHEN, Yunyun WANG

    Indefinite kernels have attracted more and more attentions in machine learning due to its wider application scope than usual positive definite kernels. However, the research about indefinite kernel clustering is relatively scarce. Furthermore, existing clustering methods are mainly designed based on positive definite kernels which are incapable in indefinite kernel scenarios. In this paper, we propose a novel indefinite kernel clustering algorithm termed as indefinite kernel maximum margin clustering (IKMMC) based on the state-of-the-art maximum margin clustering (MMC) model. IKMMC tries to find a proxy positive definite kernel to approximate the original indefinite one and thus embeds a new F-norm regularizer in the objective function to measure the diversity of the two kernels, which can be further optimized by an iterative approach. Concretely, at each iteration, given a set of initial class labels, IKMMC firstly transforms the clustering problem into a classification one solved by indefinite kernel support vector machine (IKSVM) with an extra class balance constraint and then the obtained prediction labels will be used as the new input class labels at next iteration until the error rate of prediction is smaller than a prespecified tolerance. Finally, IKMMC utilizes the prediction labels at the last iteration as the expected indices of clusters. Moreover, we further extend IKMMC from binary clustering problems to more complexmulti-class scenarios. Experimental results have shown the superiority of our algorithms.

  • RESEARCH ARTICLE
    Ying JIANG, Shichao LIU, Thomas EHRHARD

    We propose a fully abstract semantics for valuepassing CCS for trees (VCCTS) with the feature that processes are located at the vertices of a graph whose edges describe possible interaction capabilities. The operational semantics is given both in terms of a reduction semantics and in terms of a labelled transition semantics. We develop a theory of behavioral equivalences by introducing both weak barbed congruence and weak bisimilarity. In particular, we show that, on image-finite processes, weak barbed congruence coincides with weak bisimilarity. To illustrate potential applications and the powerful expressiveness of VCCTS, we formally compare VCCTS with some well-known models, e.g., dynamic pushdown networks, top-down tree automata and value-passing CCS.

  • RESEARCH ARTICLE
    Xiaodong MENG, Chentao WU, Minyi GUO, Long ZHENG, Jingyu ZHANG

    Energy consumption is one of the most significant aspects of large-scale storage systems where multilevel caches are widely used. In a typical hierarchical storage structure, upper-level storage serves as a cache for the lower level, forming a distributed multilevel cache system. In the past two decades, several classic LRU-based multilevel cache policies have been proposed to improve the overall I/O performance of storage systems. However, few power-aware multilevel cache policies focus on the storage devices in the bottom level, which consume more than 27% of the energy of the whole system [1].

    To address this problem, we propose a novel power-aware multilevel cache (PAM) policy that can reduce the energy consumption of high-performance and I/O bandwidth storage devices. In our PAM policy, an appropriate number of cold dirty blocks in the upper level cache are identified and selected to flush directly to the storage devices, providing high probability extension of the lifetime of disks in standby mode. To demonstrate the effectiveness of our proposed policy, we conduct several simulations with real-world traces. Compared to existing popular cache schemes such as PALRU, PB-LRU, and Demote, PAM reduces power consumption by up to 15% under different I/O workloads, and improves energy efficiency by up to 50.5%.

  • RESEARCH ARTICLE
    Wen LIU, Tuqian ZHANG, Yanming SHEN, Peng WANG

    In recent years, the rapid development of Internet of Things and sensor networks makes the time series data experiencing explosive growth. OpenTSDB and other emerging systems begin to use Hadoop, HBase to store massive time series data, and how to use these platforms to query and mine time series data has become a current research hotspot. As a typical time series distance measurementmethod, correlation coefficient is widely used in various applications. However, it requires a large amount of I/O and network transmission to compute the correlation coefficient of long time sequence on HBase in real time, and therefore cannot be applied to interactive query. To address this problem, in this paper, we present two methods to estimate the correlation coefficients of two sequences on HBase. We first propose a fast estimation algorithm for the upper and lower bounds of correlation coefficient, named as DCE. In order to further reduce the cost of I/O, we extend the DCE algorithm, and propose the ADCE algorithm, which can estimate the correlation coefficient quickly with an iterative manner. Experiments show that the algorithms proposed in this paper can quickly calculate the correlation coefficient of the long time series.

  • RESEARCH ARTICLE
    Momeng LIU, Yupu HU

    As a fundamental cryptographic primitive, oblivious transfer (OT) is developed for the sake of efficient usability and combinational feasibility. However, most OT protocols are built upon some quantum non-immune cryptosystems by assuming the hardness of discrete logarithm or factoring problem, whose security will break down directly in the quantum setting. Therefore, as a subarea of postquantum cryptography, lattice-based cryptography is viewed as a promising alternative and cornerstone to support for building post-quantum protocols since it enjoys some attractive properties, such as provable security against quantum adversaries and lower asymptotic complexity.

    In this paper, we first build an efficient 1-out-of-2 OT protocol upon the hardness of ring learning with errors (RLWE) problem, which is at least as hard as some worst-case ideal lattice problems. We show that this 1-out-of-2 OT protocol can be universally composable and secure against static corruptions in the random oracle model. Then we extend it to a general case, i.e., 1-out-of-N OT with achieving the same level of security. Furthermore, on the basis of the above OT structure, we obtain two improved OT protocols using two improved lattice-based key exchange protocols (respectively relying on the RLWE problem and learning with errors (LWE) problem, and both achieving better efficiency by removing the Gaussian sampling for saving cost) as building blocks. To show that our proposed OT protocol indeed achieves comparable security and efficiency, we make a comparison with another two lattice-based OT protocols in the end of the paper.

    With concerning on the potential threat from quantum computing and expecting on the practical use of OT with high efficiency, an efficient post-quantum OT protocol is pressing needed. As shown in this paper, our proposed OT protocols may be considered as post-quantumOT candidates since they can both preserve provable security relying on lattice problems and enjoy practical efficiency.

  • LETTER
    Shenyi QIAN, Daiyi LI, Tong SUN, Bin YU