Network embedding which aims to embed a given network into a low-dimensional vector space has been proved effective in various network analysis and mining tasks such as node classification, link prediction and network visualization. The emerging network embedding methods have shifted of emphasis in utilizing mature deep learning models. The neuralnetwork based network embedding has become a mainstream solution because of its high efficiency and capability of preserving the nonlinear characteristics of the network. In this paper, we propose Adversarial Network Embedding using Structural Similarity (ANESS), a novel, versatile, low-complexity GANbased network embedding model which utilizes the inherent vertex-to-vertex structural similarity attribute of the network. ANESS learns robustness and effective vertex embeddings via a adversarial training procedure. Specifically, our method aims to exploit the strengths of generative adversarial networks in generating high-quality samples and utilize the structural similarity identity of vertexes to learn the latent representations of a network. Meanwhile, ANESS can dynamically update the strategy of generating samples during each training iteration. The extensive experiments have been conducted on the several benchmark network datasets, and empirical results demonstrate that ANESS significantly outperforms other state-of-theart network embedding methods.
At present, most k-dominant Skyline query algorithms are oriented to static datasets, this paper proposes a k-dominant Skyline query algorithm for dynamic datasets. The algorithm is recursive circularly. First, we compute the dominant ability of each object and sort objects in descending order by dominant ability. Then, we maintain an inverted index of the dominant index by k-dominant Skyline point calculation algorithm. When the data changes, it is judged whether the update point will affect the k-dominant Skyline point set. So the k-dominant Skyline point of the newdata set is obtained by inserting and deleting algorithm. The proposed algorithm resolves maintenance issue of a frequently updated database by dynamically updating the data sets. The experimental results show that the query algorithm can effectively improve query efficiency.
During a two-day strategic workshop in February 2018, 22 information retrieval researchers met to discuss the future challenges and opportunities within the field. The outcome is a list of potential research directions, project ideas, and challenges. This report describes themajor conclusionswe have obtained during the workshop. A key result is that we need to open our mind to embrace a broader IR field by rethink the definition of information, retrieval, user, system, and evaluation of IR. By providing detailed discussions on these topics, this report is expected to inspire our IR researchers in both academia and industry, and help the future growth of the IR research community.
Fingerprint matching, spoof mitigation and liveness detection are the trendiest biometric techniques, mostly because of their stability through life, uniqueness and their least risk of invasion. In recent decade, several techniques are presented to address these challenges over well-known data-sets. This study provides a comprehensive review on the fingerprint algorithms and techniques which have been published in the last few decades. It divides the research on fingerprint into nine different approaches including feature based, fuzzy logic, holistic, image enhancement, latent, conventional machine learning, deep learning, template matching and miscellaneous techniques. Among these, deep learning approach has outperformed other approaches and gained significant attention for future research. By reviewing fingerprint literature, it is historically divided into four eras based on 106 referred papers and their cumulative citations.
As the first attempt, this paper proposes a model for the Chinese high school timetabling problems (CHSTPs) under the new curriculum innovation which was launched in 2014 by the Chinese government. According to the new curriculum innovation, students in high school can choose subjects that they are interested in instead of being forced to select one of the two study directions, namely, Science and Liberal Arts. Meanwhile, they also need to attend compulsory subjects as traditions. CHSTPs are student-oriented and involve more student constraints that make them more complex than the typical “Class-Teacher model”, in which the element “Teacher” is the primary constraint. In this paper, we first describe in detail the mathematical model of CHSTPs and then design a new two-part representation for the candidate solution. Based on the new representation, we adopt a two-phase simulated annealing (SA) algorithm to solve CHSTPs. A total number of 45 synthetic instances with different amounts of classes, teachers, and levels of student constraints are generated and used to illustrate the characteristics of the CHSTP model and the effectiveness of the designed representation and algorithm. Finally,we apply the proposed model, the designed two-part representation and the two-phase SA on10 real high schools.
Density based clustering algorithms (DBCLAs) rely on the notion of density to identify clusters of arbitrary shapes, sizes with varying densities. Existing surveys on DBCLAs cover only a selected set of algorithms. These surveys fail to provide an extensive information about a variety of DBCLAs proposed till date including a taxonomy of the algorithms. In this paper we present a comprehensive survey of various DBCLAs over last two decades along with their classification. We group the DBCLAs in each of the four categories: density definition, parameter sensitivity, execution mode and nature of data and further divide them into various classes under each of these categories. In addition, we compare the DBCLAs through their common features and variations in citation and conceptual dependencies. We identify various application areas of DBCLAs in domains such as astronomy, earth sciences, molecular biology, geography, multimedia. Our survey also identifies probable future directions of DBCLAs where involvement of density based methods may lead to favorable results.
Entity set expansion (ESE) aims to expand an entity seed set to obtain more entities which have common properties. ESE is important for many applications such as dictionary construction and query suggestion. Traditional ESE methods relied heavily on the text and Web information of entities. Recently, some ESE methods employed knowledge graphs (KGs) to extend entities. However, they failed to effectively and efficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia. In this paper, we model a KG as a heterogeneous information network (HIN) containing multiple types of objects and relations. Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities. Then we rank the entities according to the meta path based structural similarity. Furthermore, to utilize the text description of entities in Wikipedia, we propose an extended model CoMeSE++ which combines both structural information revealed by a KG and text information in Wikipedia for ESE. Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.
Next location prediction has aroused great interests in the era of internet of things (IoT). With the ubiquitous deployment of sensor devices, e.g., GPS and Wi-Fi, IoT environment offers new opportunities for proactively analyzing human mobility patterns and predicting user’s future visit in low cost, no matter outdoor and indoor. In this paper, we consider the problem of next location prediction in IoT environment via a session-based manner.We suggest that user’s future intention in each session can be better inferred for more accurate prediction if patterns hidden inside both trajectory and signal strength sequences collected from IoT devices can be jointly modeled, which however existing state-of-the-art methods have rarely addressed. To this end, we propose a trajectory and sIgnal sequence (TSIS) model, where the trajectory transition regularities and signal temporal dynamics are jointly embedded in a neural network based model. Specifically, we employ gated recurrent unit (GRU) for capturing the temporal dynamics in the multivariate signal strength sequence. Moreover, we adapt gated graph neural networks (gated GNNs) on location transition graphs to explicitly model the transition patterns of trajectories. Finally, both the low-dimensional representations learned from trajectory and signal sequence are jointly optimized to construct a session embedding, which is further employed to predict the next location. Extensive experiments on two real-world Wi-Fi based mobility datasets demonstrate that TSIS is effective and robust for next location prediction compared with other competitive baselines.
Smart city driven by Big Data and Internet of Things (IoT) has become a most promising trend of the future. As one important function of smart city, event alert based on time series prediction is faced with the challenge of how to extract and represent discriminative features of sensing knowledge from the massive sequential data generated by IoT devices. In this paper, a framework based on sparse representation model (SRM) for time series prediction is proposed as an efficient approach to tackle this challenge. After dividing the over-complete dictionary into upper and lower parts, the main idea of SRMis to obtain the sparse representation of time series based on the upper part firstly, and then realize the prediction of future values based on the lower part. The choice of different dictionaries has a significant impact on the performance of SRM. This paper focuses on the study of dictionary construction strategy and summarizes eight variants of SRM. Experimental results demonstrate that SRM can deal with different types of time series prediction flexibly and effectively.
Visual information is highly advantageous for the evolutionary success of almost all animals. This information is likewise critical for many computing tasks, and visual computing has achieved tremendous successes in numerous applications over the last 60 years or so. In that time, the development of visual computing has moved forwards with inspiration from biological mechanisms many times. In particular, deep neural networks were inspired by the hierarchical processing mechanisms that exist in the visual cortex of primate brains (including ours), and have achieved huge breakthroughs in many domainspecific visual tasks. In order to better understand biologically inspired visual computing, we will present a survey of the current work, and hope to offer some new avenues for rethinking visual computing and designing novel neural network architectures.
Manifold regularization (MR) provides a powerful framework for semi-supervised classification using both the labeled and unlabeled data. It constrains that similar instances over the manifold graph should share similar classification outputs according to the manifold assumption. It is easily noted that MR is built on the pairwise smoothness over the manifold graph, i.e., the smoothness constraint is implemented over all instance pairs and actually considers each instance pair as a single operand. However, the smoothness can be pointwise in nature, that is, the smoothness shall inherently occur “everywhere” to relate the behavior of each point or instance to that of its close neighbors. Thus in this paper, we attempt to develop a pointwise MR (PW_MR for short) for semi-supervised learning through constraining on individual local instances. In this way, the pointwise nature of smoothness is preserved, and moreover, by considering individual instances rather than instance pairs, the importance or contribution of individual instances can be introduced. Such importance can be described by the confidence for correct prediction, or the local density, for example. PW_MR provides a different way for implementing manifold smoothness. Finally, empirical results show the competitiveness of PW_MR compared to pairwise MR.
Sentence alignment is a basic task in natural language processing which aims to extract high-quality parallel sentences automatically. Motivated by the observation that aligned sentence pairs contain a larger number of aligned words than unaligned ones, we treat word translation as one of the most useful external knowledge. In this paper, we show how to explicitly integrate word translation into neural sentence alignment. Specifically, this paper proposes three cross-lingual encoders to incorporate word translation: 1) Mixed Encoder that learns words and their translation annotation vectors over sequences where words and their translations are mixed alternatively; 2) Factored Encoder that views word translations as features and encodes words and their translations by concatenating their embeddings; and 3) Gated Encoder that uses gate mechanism to selectively control the amount of word translations moving forward. Experimentation on NIST MT and Opensubtitles Chinese-English datasets on both non-monotonicity and monotonicity scenarios demonstrates that all the proposed encoders significantly improve sentence alignment performance.
In the field of social network analysis, Link Prediction is one of the hottest topics which has been attracted attentions in academia and industry. So far, literatures for solving link prediction can be roughly divided into two categories: similarity-based and learning-based methods. The learningbased methods have higher accuracy, but their time complexities are too high for complex networks. However, the similaritybased methods have the advantage of low time consumption, so improving their accuracy becomes a key issue. In this paper, we employ community structures of social networks to improve the prediction accuracy and propose the stretch shrink distance based algorithm (SSDBA). In SSDBA, we first detect communities of a social network and identify active nodes based on community average threshold (CAT) and node average threshold (NAT) in each community. Second, we propose the stretch shrink distance (SSD) model to iteratively calculate the changes of distances between active nodes and their local neighbors. Finally, we make predictions when these links’ distances tend to converge. Furthermore, extensive parameters learning have been carried out in experiments.We compare our SSDBA with other popular approaches. Experimental results validate the effectiveness and efficiency of proposed algorithm.
Fault localization techniques are originally proposed to assist in manual debugging by generally producing a rank list of suspicious locations.With the increasing popularity of automated program repair, the fault localization techniques have been introduced to effectively reduce the search space of automated program repair. Unlike developers who mainly focus on the rank information, current automated program repair has two strategies to use the fault localization information: suspiciousness-first algorithm (SFA) based on the suspiciousness accuracy and rank-first algorithm (RFA) relying on the rank accuracy. However, despite the fact that the two different usages are widely adopted by current automated program repair and may result in different repair results, little is known about the impacts of the two strategies on automated program repair. In this paper we empirically compare the performance of SFA and RFA in the context of automated program repair. Specifically, we implement the two strategies and six well-studied fault localization techniques into four state-of-the-art automated program repair tools, and then use these tools to perform repair experiments on 60 real-world bugs from Defects4J. Our study presents a number of interesting findings: RFA outperforms SFA in 70.02% of cases when measured by the number of candidate patches generated before a valid patch is found (NCP), while SFA performs better in parallel repair and patch diversity; the performance of SFA can be improved by increasing the suspiciousness accuracy of fault localization techniques; finally, we use SimFix that deploys SFA to successfully repair four extra Defects4J bugs which cannot be repaired by SimFix originally using RFA. These observations provide a new perspective for future research on the usage and improvement of fault localization in automated program repair.
Resource planning is becoming an increasingly important and timely problem for cloud users. As more Web services are moved to the cloud, minimizing network usage is often a key driver of cost control. Most existing approaches focus on resources such as CPU, memory, and disk I/O. In particular, CPU receives the most attention from researchers, but the bandwidth is somehow neglected. It is challenging to predict the network throughput of modern Web services, due to the factors of diverse and complex response, evolvingWeb services, and complex network transportation. In this paper, we propose a methodology of what-if analysis, named Log2Sim, to plan the bandwidth resource of Web services. Log2Sim uses a lightweight workload model to describe user behavior, an automated mining approach to obtain characteristics of workloads and responses from massive Web logs, and traffic-aware simulations to predict the impact on the bandwidth consumption and the response time in changing contexts. We use a real-life Web system and a classic benchmark to evaluate Log2Sim in multiple scenarios. The evaluation result shows that Log2Sim has good performance in the prediction of bandwidth consumption. The average relative error is 2% for the benchmark and 8% for the real-life system. As for the response time, Log2Sim cannot produce accurate predictions for every single service request, but the simulation results always show similar trends on average response time with the increase of workloads in different changing contexts. It can provide sufficient information for the system administrator in proactive bandwidth planning.
Wireless sensor network (WSN) is effective for monitoring the target environment,which consists of a large number of sensor nodes of limited energy. An efficient medium access control (MAC) protocol is thus imperative to maximize the energy efficiency and performance of WSN. The most existing MAC protocols are based on the scheduling of sleep and active period of the nodes, and do not consider the relationship between the load condition and performance. In this paper a novel scheme is proposed to properly determine the duty cycle of the WSN nodes according to the load,which employs the Q-learning technique and function approximation with linear regression. This allows low-latency energy-efficient scheduling for a wide range of traffic conditions, and effectively overcomes the limitation of Q-learning with the problem of continuous state-action space. NS3 simulation reveals that the proposed scheme significantly improves the throughput, latency, and energy efficiency compared to the existing fully active scheme and S-MAC.