Feb 2020, Volume 14 Issue 1
    

  • Select all
  • RESEARCH ARTICLE
    Yixuan TANG, Zhilei REN, Weiqiang KONG, He JIANG

    Compilers are widely-used infrastructures in accelerating the software development, and expected to be trustworthy. In the literature, various testing technologies have been proposed to guarantee the quality of compilers. However, there remains an obstacle to comprehensively characterize and understand compiler testing. To overcome this obstacle, we propose a literature analysis framework to gain insights into the compiler testing area. First, we perform an extensive search to construct a dataset related to compiler testing papers. Then, we conduct a bibliometric analysis to analyze the productive authors, the influential papers, and the frequently tested compilers based on our dataset. Finally, we utilize association rules and collaboration networks to mine the authorships and the communities of interests among researchers and keywords. Some valuable results are reported. We find that the USA is the leading country that contains the most influential researchers and institutions. The most active keyword is “random testing”. We also find that most researchers have broad interests within small-scale collaborators in the compiler testing area.

  • RESEARCH ARTICLE
    Samuel IRVING, Bin LI, Shaoming CHEN, Lu PENG, Weihua ZHANG, Lide DUAN

    Performance variability, stemming from nondeterministic hardware and software behaviors or deterministic behaviors such as measurement bias, is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics and is slated to become even more of a concern as interest in Big Data analytic increases. Conventional methods use various measures (such as geometric mean) to quantify the performance of different benchmarks to compare computers without considering this variability which may lead to wrong conclusions. In this paper, we propose three resampling methods for performance evaluation and comparison: a randomization test for a general performance comparison between two computers, bootstrapping confidence estimation, and an empirical distribution and five-number-summary for performance evaluation. The results show that for both PARSEC and highvariance BigDataBench benchmarks 1) the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large; 2) bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure (e.g., ratio of geometric means); and 3) when the difference is very small, a single test is often not enough to reveal the nature of the computer performance due to the variability of computer systems.We further propose using empirical distribution to evaluate computer performance and a five-number-summary to summarize computer performance. We use published SPEC 2006 results to investigate the sources of performance variation by predicting performance and relative variation for 8,236 machines. We achieve a correlation of predicted performances of 0.992 and a correlation of predicted and measured relative variation of 0.5. Finally, we propose the utilization of a novel biplotting technique to visualize the effectiveness of benchmarks and cluster machines by behavior. We illustrate the results and conclusion through detailed Monte Carlo simulation studies and real examples.

  • RESEARCH ARTICLE
    Jialei LIU, Shangguang WANG, Ao ZHOU, Jinliang XU, Fangchun YANG

    Since service level agreement (SLA) is essentially used to maintain reliable quality of service between cloud providers and clients in cloud environment, there has been a growing effort in reducing power consumption while complying with the SLA by maximizing physical machine (PM)-level utilization and load balancing techniques in infrastructure as a service. However, with the recent introduction of container as a service by cloud providers, containers are increasingly popular and will become the major deployment model in the cloud environment and specifically in platform as a service. Therefore, reducing power consumption while complying with the SLA at virtual machine (VM)-level becomes essential. In this context, we exploit a container consolidation scheme with usage prediction to achieve the above objectives. To obtain a reliable characterization of overutilized and underutilized PMs, our scheme jointly exploits the current and predicted CPU utilization based on local history of the considered PMs in the process of the container consolidation. We demonstrate our solution through simulations on real workloads. The experimental results show that the container consolidation scheme with usage prediction reduces the power consumption, number of container migrations, and average number of active VMs while complying with the SLA.

  • RESEARCH ARTICLE
    Tun LI, Jun YE, Qingping TAN

    It is often the case that in the development of a system-on-a-chip (SoC) design, a family of SystemC transaction level models (TLM) is created. TLMs in the same family often share common functionalities but differ in their timing, implementation, configuration and performance in various SoC developing phases. In most cases, all the TLMs in a family must be verified for the follow-up design activities. In our previous work, we proposed to call such family TLM product line (TPL), and proposed feature-oriented (FO) design methodology for efficient TPL development. However, developers can only verify TLM in a family one by one, which causes large portion of duplicated verification overhead. Therefore, in our proposed methodology, functional verification of TPL has become a bottleneck. In this paper, we proposed a novel TPL verification method for FO designs. In our method, for the given property, we can exponentially reduce the number of TLMs to be verified by identifying mutefeature-modules (MFM), which will avoid duplicated veri-fication. The proposed method is presented in informal and formal way, and the correctness of it is proved. The theoretical analysis and experimental results on a real design show the correctness and efficiency of the proposed method.

  • RESEARCH ARTICLE
    Shaocheng GUO, Songcan CHEN, Qing TIAN

    Ordinal regression (OR) or classification is a machine learning paradigm for ordinal labels. To date, there have been a variety of methods proposed including kernel based and neural network based methods with significant performance. However, existing OR methods rarely consider latent structures of given data, particularly the interaction among covariates, thus losing interpretability to some extent. To compensate this, in this paper, we present a new OR method: ordinal factorization machine with hierarchical sparsity (OFMHS), which combines factorization machine and hierarchical sparsity together to explore the hierarchical structure behind the input variables. For the sake of optimization, we formulate OFMHS as a convex optimization problem and solve it by adopting the efficient alternating directions method of multipliers (ADMM) algorithm. Experimental results on synthetic and real datasets demonstrate the superiority of our method in both performance and significant variable selection.

  • RESEARCH ARTICLE
    Lydia LAZIB, Bing QIN, Yanyan ZHAO, Weinan ZHANG, Ting LIU

    The automatic detection of negation is a crucial task in a wide-range of natural language processing (NLP) applications, including medical data mining, relation extraction, question answering, and sentiment analysis. In this paper, we present a syntactic path-based hybrid neural network architecture, a novel approach to identify the scope of negation in a sentence. Our hybrid architecture has the particularity to capture salient information to determine whether a token is in the scope or not, without relying on any human intervention. This approach combines a bidirectional long shortterm memory (Bi-LSTM) network and a convolutional neural network (CNN). The CNN model captures relevant syntactic features between the token and the cue within the shortest syntactic path in both constituency and dependency parse trees. The Bi-LSTM learns the context representation along the sentence in both forward and backward directions. We evaluate our model on the Bioscope corpus, and get 90.82% F-score (78.31% PCS) on the abstract sub-corpus, outperforming features-dependent approaches.

  • RESEARCH ARTICLE
    Yi ZHOU, Qichuan GENG, Zhong ZHOU, Wei WU

    Nighttime images are difficult to process due to insufficient brightness, lots of noise, and lack of details. Therefore, they are always removed from time-lapsed image analysis. It is interesting that nighttime images have a unique and wonderful building features that have robust and salient lighting cues from human activities. Lighting variation depicts both the statistical and individual habitation, and it has an inherent man-made repetitive structure from architectural theory. Inspired by this, we propose an automatic nighttime façade recovery method that exploits the lattice structures of window lighting. First, a simple but efficient classification method is employed to determine the salient bright regions, which may be lit windows. Then we groupwindows into multiple lattice proposals with respect to façades by patch matching, followed by greedily removing overlapping lattices. Using the horizon constraint, we solve the ambiguous proposals problem and obtain the correct orientation. Finally, we complete the generated façades by filling in the missing windows. This method is well suited for use in urban environments, and the results can be used as a good single-view compensation method for daytime images. The method also acts as a semantic input to other learning-based 3D image reconstruction techniques. The experiment demonstrates that our method works well in nighttime image datasets, and we obtain a high lattice detection rate of 82.1% of 82 challenging images with a low mean orientation error of 12.1±4.5 degrees.

  • RESEARCH ARTICLE
    Yuanrui ZHANG, Frédéric MALLET, Yixiang CHEN

    The Spatio-Temporal Consistency Language (STeC) is a high-level modeling language that deals natively with spatio-temporal behaviour, i.e., behaviour relating to certain locations and time. Such restriction by both locations and time is of first importance for some types of real-time systems. CCSL is a formal specification language based on logical clocks. It is used to describe some crucial safety properties for real-time systems, due to its powerful expressiveness of logical and chronometric time constraints. We consider a novel verification framework combining STeC and CCSL, with the advantages of addressing spatio-temporal consistency of system behaviour and easily expressing some crucial time constraints. We propose a theory combining these two languages and a method verifying CCSL properties in STeC models. We adopt UPPAAL as the model checking tool and give a simple example to illustrate how to carry out verification in our framework.

  • RESEARCH ARTICLE
    Yudong QIN, Deke GUO, Zhiyao HU, Bangbang REN

    Multicast transfer can efficiently save the bandwidth consumption and reduce the load on the source node than a series of independent unicast transfers. Nowadays, many applications employ the content replica strategy to improve the robustness and efficiency; hence each file and its replicas are usually distributed among multiple sources. In such scenarios, the traditional deterministic multicast develops into the Uncertain multicast, which has more flexibility in the source selection. In this paper, we focus on building and maintaining a minimal cost forest (MCF) for any uncertain multicast, whose group members (source nodes and destination nodes) may join or leave after constructing a MCF. We formulate this dynamic minimal cost forest (DMCF) problem as a mixed integer programming model. We then design three dedicated methods to approximate the optimal solution. Among them, our a-MCF aims to efficiently construct an MCF for any given uncertain multicast, without dynamic behaviors of multicast group members. The d-MCF method motivates to slightly update the existing MCF via local modifications once appearing a dynamic behavior. It can achieve the balance between the minimal cost and the minimal modifications to the existing forest. The last r-MCF is a supplement method to the d-MCF method, since many rounds of local modifications maymake the resultant forest far away from the optimal forest. Accordingly, our r-MCF method monitors the accumulated degradation and triggers the rearrangement process to reconstruct an new MCF when necessary. The comprehensive evaluation results demonstrate that our methods can well tackle the proposed DMCF problem.

  • RESEARCH ARTICLE
    Waixi LIU, Yu WANG, Jie ZHANG, Hongjian LIAO, Zhongwei LIANG, Xiaochu LIU

    When evaluating the performance of distributed software-defined network (SDN) controller architecture in data center networks, the required number of controllers for a given network topology and their location are major issues of interest. To address these issues, this study proposes the adaptively adjusting and mapping controllers (AAMcon) to design a stateful data plane. We use the complex network community theory to select a key switch to place the controller which is closer to switches it controls in a subnet. A physically distributed but logically centralized controller pool is built based on the network function virtualization (NFV). And then we propose a fast start/overload avoid algorithm to adaptively adjust the number of controllers according to the demand. We performed an analysis for AAMcon to find the optimal distance between the switch and controller. Finally, experiments show the following results. (1) For the number of controllers, AAMcon can greatly follow the demand; for the placement location of controller, controller can respond to the request of switch with the least distance to minimize the delay between the switch and it. (2) For failure tolerance, AAMcon shows good robustness. (3) AAMcon requires less delay to the network with more significant community structure. In fact, there is an inverse relationship between the community modularity and average distance between the switch and controller, i.e., the average delay decreases when the community modularity increases.(4) AAMcon can achieve the load balance between the controllers. (5) Compared to DCP-GK and k-critical, AAMcon shows good performance

  • REVIEW ARTICLE
    Lingli LI, Hongzhi WANG, Jianzhong LI, Hong GAO

    Uncertain data are data with uncertainty information, which exist widely in database applications. In recent years, uncertainty in data has brought challenges in almost all database management areas such as data modeling, query representation, query processing, and data mining. There is no doubt that uncertain data management has become a hot research topic in the field of data management. In this study, we explore problems in managing uncertain data, present state-of-the-art solutions, and provide future research directions in this area. The discussed uncertain data management techniques include data modeling, query processing, and data mining in uncertain data in the forms of relational, XML, graph, and stream.

  • RESEARCH ARTICLE
    Huiping LIU, Cheqing JIN, Aoying ZHOU

    With the increasing number of GPS-equipped vehicles, more and more trajectories are generated continuously, based on which some urban applications become feasible, such as route planning. In general, popular route that has been travelled frequently is a good choice, especially for people who are not familiar with the road networks. Moreover, accurate estimation of the travel cost (such as travel time, travel fee and fuel consumption) will benefit a wellscheduled trip plan. In this paper, we address this issue by finding the popular route with travel cost estimation. To this end, we design a system consists of three main components. First, we propose a novel structure, called popular traverse graph where each node is a popular location and each edge is a popular route between locations, to summarize historical trajectories without road network information. Second, we propose a self-adaptive method to model the travel cost on each popular route at different time interval, so that each time interval has a stable travel cost. Finally, based on the graph, given a query consists of source, destination and leaving time, we devise an efficient route planning algorithmwhich considers optimal route concatenation to search the popular route from source to destination at the leaving time with accurate travel cost estimation. Moreover, we conduct comprehensive experiments and implement our system by a mobile App, the results show that our method is both effective and efficient.

  • RESEARCH ARTICLE
    Xingyue CHEN, Tao SHANG, Feng ZHANG, Jianwei LIU, Zhenyu GUAN

    When users store data in big data platforms, the integrity of outsourced data is a major concern for data owners due to the lack of direct control over the data. However, the existing remote data auditing schemes for big data platforms are only applicable to static data. In order to verify the integrity of dynamic data in a Hadoop big data platform, we presents a dynamic auditing scheme meeting the special requirement of Hadoop. Concretely, a new data structure, namely Data Block Index Table, is designed to support dynamic data operations on HDFS (Hadoop distributed file system), including appending, inserting, deleting, and modifying. Then combined with the MapReduce framework, a dynamic auditing algorithm is designed to audit the data on HDFS concurrently. Analysis shows that the proposed scheme is secure enough to resist forge attack, replace attack and replay attack on big data platform. It is also efficient in both computation and communication.

  • LETTER
    Chaker KATAR, Ahmed BADREDDINE
  • LETTER
    Zhefan ZHONG, Xin LIN, Liang HE
  • ERRATUM
    Xianfa CAI, Guihua WEN, Jia WEI, Zhiwen YU
  • ERRATUM
    Lijin WANG, Yilong YIN, Yiwen ZHONG
  • RETRACTION NOTE
    Awada Uchechukwu, Kequi Li, Yanming Shen
  • RETRACTION NOTE
    Awada UCHECHUKWU, Keqiu LI, Yanming SHEN