Test-case prioritization, proposed at the end of last century, aims to schedule the execution order of test cases so as to improve test effectiveness. In the past years, test-case prioritization has gained much attention, and has significant achievements in five aspects: prioritization algorithms, coverage criteria, measurement, practical concerns involved, and application scenarios. In this article, we will first review the achievements of test-case prioritization from these five aspects and then give our perspectives on its challenges.
The numerous works on media retargeting call for a thorough and comprehensive survey for reviewing and categorizing existing works and providing insights that can help future design of retargeting approaches and its applications. First, we present the basic problem of media retargeting and detail state-of-the-art retargeting methods devised to solve it. Second, we review recent works on objective quality assessment of media retargeting, where we find that although these works are designed to make the objective assessment result in accordance with the subjective evaluation, they are only suitable for certain situations. Considering the subjective nature of aesthetics, designing objective assessment metric for media retargeting could be a promising area for investigation. Third, we elaborate on other applications extended from retargeting techniques. We show how to apply the retargeting techniques in other fields to solve their challenging problems, and reveal that retargeting technique is not just a simple scaling algorithm, but a thought or concept, which has great flexibility and is quite useful.We believe this review can help researchers and practitioners to solve the existing problems of media retargeting and bring new ideas in their works.
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation platform is very important for the research on HPC software and hardware technologies. To effectively evaluate the performance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation platform, called HPC-NetSim. HPC-NetSim uses applicationdriven workloads and inherits the characteristics of the detailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router’s on/off states.We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.
Fault localization is an important and challenging task during software testing. Among techniques studied in this field, program spectrum based fault localization is a promising approach. To perform spectrum based fault localization, a set of test oracles should be provided, and the effectiveness of fault localization depends highly on the quality of test oracles. Moreover, their effectiveness is usually affected when multiple simultaneous faults are present. Faced with multiple faults it is difficult for developers to determine when to stop the fault localization process. To address these issues, we propose an iterative fault localization process, i.e., an iterative process of selecting test cases for effective fault localization (IPSETFUL), to identify as many faults as possible in the program until the stopping criterion is satisfied. It is performed based on a concept lattice of program spectrum (CLPS) proposed in our previous work. Based on the labeling approach of CLPS, program statements are categorized as dangerous statements, safe statements, and sensitive statements. To identify the faults, developers need to check the dangerous statements. Meantime, developers need to select a set of test cases covering the dangerous or sensitive statements from the original test suite, and a new CLPS is generated for the next iteration. The same process is proceeded in the same way. This iterative process ends until there are no failing tests in the test suite and all statements on the CLPS become safe statements. We conduct an empirical study on several subject programs, and the results show that IPSETFUL can help identifymost of the faults in the program with the given test suite. Moreover, it can save much effort in inspecting unfaulty program statements compared with the existing spectrum based fault localization techniques and the relevant state of the art technique.
Automatic facial expression recognition (FER) from non-frontal views is a challenging research topic which has recently started to attract the attention of the research community. Pose variations are difficult to tackle and many face analysis methods require the use of sophisticated normalization and initialization procedures. Thus head-pose invariant facial expression recognition continues to be an issue to traditional methods. In this paper, we propose a novel approach for pose-invariant FER based on pose-robust features which are learned by deep learning methods — principal component analysis network (PCANet) and convolutional neural networks (CNN) (PRP-CNN). In the first stage, unlabeled frontal face images are used to learn features by PCANet. The features, in the second stage, are used as the target of CNN to learn a feature mapping between frontal faces and non-frontal faces. We then describe the non-frontal face images using the novel descriptions generated by the maps, and get unified descriptors for arbitrary face images. Finally, the pose-robust features are used to train a single classifier for FER instead of training multiple models for each specific pose. Our method, on the whole, does not require pose/ landmark annotation and can recognize facial expression in a wide range of orientations. Extensive experiments on two public databases show that our framework yields dramatic improvements in facial expression analysis.
Multi-label learning is an effective framework for learning with objects that have multiple semantic labels, and has been successfully applied into many real-world tasks. In contrast with traditional single-label learning, the cost of labeling a multi-label example is rather high, thus it becomes an important task to train an effective multi-label learning model with as few labeled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is the most important approach to reduce labeling cost. In this paper, we propose a novel approach MADM for batch mode multi-label active learning. On one hand, MADM exploits representativeness and diversity in both the feature and label space by matching the distribution between labeled and unlabeled data. On the other hand, it tends to query predicted positive instances, which are expected to be more informative than negative ones. Experiments on benchmark datasets demonstrate that the proposed approach can reduce the labeling cost significantly.
Canonical correlation analysis (CCA) is one of the most well-known methods to extract features from multiview data and has attracted much attention in recent years. However, classical CCA is unsupervised and does not take discriminant information into account. In this paper, we add discriminant information into CCA by using random cross view correlations between within-class samples and propose a new method for multi-view dimensionality reduction called canonical random correlation analysis (RCA). In RCA, two approaches for randomly generating cross-view correlation samples are developed on the basis of bootstrap technique. Furthermore, kernel RCA (KRCA) is proposed to extract nonlinear correlations between different views. Experiments on several multi-view data sets show the effectiveness of the proposed methods.
Plans with loops are more general and compact than classical sequential plans, and gaining increasing attentions in artificial intelligence (AI). While many existing approaches mainly focus on algorithmic issues, few work has been devoted to the semantic foundations on planning with loops. In this paper, we first develop a tailored action language
Today’s ubiquitous online social networks serve multiple purposes, including social communication (Facebook, Renren), and news dissemination (Twitter). But how does a social network’s design define its functionality? Answering this would need social network providers to take a proactive role in defining and guiding user behavior.
In this paper, we first take a step to answer this question with a data-driven approach, through measurement and analysis of the Sina Weibo microblogging service. Often compared to Twitter because of its format,Weibo is interesting for our analysis because it serves as a social communication tool and a platform for news dissemination, too. While similar to Twitter in functionality, Weibo provides a distinguishing feature, comments, allowing users to form threaded conversations around a single tweet. Our study focuses on this feature, and how it contributes to interactions and improves social engagement.We use analysis of comment interactions to uncover their role in social interactivity, and use comment graphs to demonstrate the structure of Weibo users interactions. Finally, we present a case study that shows the impact of comments in malicious user detection, a key application on microblogging systems. That is, using properties of comments significantly improves the accuracy in both modeling and detection of malicious users.
Cloud computing is becoming a very popular word in industry and is receiving a large amount of attention from the research community. Replica management is one of the most important issues in the cloud, which can offer fast data access time, high data availability and reliability. By keeping all replicas active, the replicas may enhance system task successful execution rate if the replicas and requests are reasonably distributed. However, appropriate replica placement in a large-scale, dynamically scalable and totally virtualized data centers is much more complicated. To provide cost-effective availability, minimize the response time of applications and make load balancing for cloud storage, a new replica placement is proposed. The replica placement is based on five important parameters: mean service time, failure probability, load variance, latency and storage usage. However, replication should be used wisely because the storage size of each site is limited. Thus, the site must keep only the important replicas.We also present a new replica replacement strategy based on the availability of the file, the last time the replica was requested, number of access, and size of replica. We evaluate our algorithm using the CloudSim simulator and find that it offers better performance in comparison with other algorithms in terms of mean response time, effective network usage, load balancing, replication frequency, and storage usage.
To extend the lifetime of wireless sensor networks, reducing and balancing energy consumptions are main concerns in data collection due to the power constrains of the sensor nodes. Unfortunately, the existing data collection schemesmainly focus on energy saving but overlook balancing the energy consumption of the sensor nodes. In addition, most of them assume that each sensor has a global knowledge about the network topology. However, in many real applications, such a global knowledge is not desired due to the dynamic features of the wireless sensor network. In this paper, we propose an approximate self-adaptive data collection technique (ASA), to approximately collect data in a distributed wireless sensor network. ASA investigates the spatial correlations between sensors to provide an energyefficient and balanced route to the sink, while each sensor does not know any global knowledge on the network.We also show that ASA is robust to failures. Our experimental results demonstrate that ASA can provide significant communication (and hence energy) savings and equal energy consumption of the sensor nodes.
Solid state disks (SSDs) are becoming one of the mainstream storage devices due to their salient features, such as high read performance and low power consumption. In order to obtain high write performance and extend flash lifespan, SSDs leverage an internal DRAM to buffer frequently rewritten data to reduce the number of program operations upon the flash. However, existing buffer management algorithms demonstrate their blank in leveraging data access features to predict data attributes. In various real-world workloads, most of large sequential write requests are rarely rewritten in near future. Once these write requests occur, many hot data will be evicted from DRAM into flash memory, thus jeopardizing the overall system performance. In order to address this problem, we propose a novel large write data identification scheme, called Prober. This scheme probes large sequential write sequences among the write streams at early stage to prevent them from residing in the buffer. In the meantime, to further release space and reduce waiting time for handling the incoming requests, we temporarily buffer the large data into DRAM when the buffer has free space, and leverage an actively write-back scheme for large sequential write data when the flash array turns into idle state. Experimental results demonstrate that our schemes improve hit ratio of write requests by up to 10%, decrease the average response time by up to 42% and reduce the number of erase operations by up to 11%, compared with the state-of-the-art buffer replacement algorithms.