During a two-day strategic workshop in February 2018, 22 information retrieval researchers met to discuss the future challenges and opportunities within the field. The outcome is a list of potential research directions, project ideas, and challenges. This report describes themajor conclusionswe have obtained during the workshop. A key result is that we need to open our mind to embrace a broader IR field by rethink the definition of information, retrieval, user, system, and evaluation of IR. By providing detailed discussions on these topics, this report is expected to inspire our IR researchers in both academia and industry, and help the future growth of the IR research community.
Resource planning is becoming an increasingly important and timely problem for cloud users. As more Web services are moved to the cloud, minimizing network usage is often a key driver of cost control. Most existing approaches focus on resources such as CPU, memory, and disk I/O. In particular, CPU receives the most attention from researchers, but the bandwidth is somehow neglected. It is challenging to predict the network throughput of modern Web services, due to the factors of diverse and complex response, evolvingWeb services, and complex network transportation. In this paper, we propose a methodology of what-if analysis, named Log2Sim, to plan the bandwidth resource of Web services. Log2Sim uses a lightweight workload model to describe user behavior, an automated mining approach to obtain characteristics of workloads and responses from massive Web logs, and traffic-aware simulations to predict the impact on the bandwidth consumption and the response time in changing contexts. We use a real-life Web system and a classic benchmark to evaluate Log2Sim in multiple scenarios. The evaluation result shows that Log2Sim has good performance in the prediction of bandwidth consumption. The average relative error is 2% for the benchmark and 8% for the real-life system. As for the response time, Log2Sim cannot produce accurate predictions for every single service request, but the simulation results always show similar trends on average response time with the increase of workloads in different changing contexts. It can provide sufficient information for the system administrator in proactive bandwidth planning.
Privacy preservation is a primary concern in social networkswhich employ a variety of privacy preservations mechanisms to preserve and protect sensitive user information including age, location, education, interests, and others. The task of matching user identities across different social networks is considered a challenging task. In this work, we propose an algorithm to reveal user identities as a set of linked accounts from different social networks using limited user profile data, i.e., user-name and friendship. Thus, we propose a framework, ExpandUIL, that includes three standalone algorithms based on (i) the percolation graph matching in ExpandFullName algorithm, (ii) a supervised machine learning algorithm that works with the graph embedding, and (iii) a combination of the two, ExpandUserLinkage algorithm. The proposed framework as a set of algorithms is significant as, (i) it is based on the network topology and requires only name feature of the nodes, (ii) it requires a considerably low initial seed, as low as one initial seed suffices, (iii) it is iterative and scalable with applicability to online incoming stream graphs, and (iv) it has an experimental proof of stability over a real ground-truth dataset. Experiments on real datasets, Instagram and VK social networks, show upto 75% recall for linked accounts with 96% accuracy using only one given seed pair.
Mobile computing has fast emerged as a pervasive technology to replace the old computing paradigms with portable computation and context-aware communication. Existing software systems can be migrated (while preserving their data and logic) to mobile computing platforms that support portability, context-sensitivity, and enhanced usability. In recent years, some research and development efforts have focused on a systematic migration of existing software systems to mobile computing platforms.
To investigate the research state-of-the-art on the migration of existing software systems to mobile computing platforms. We aim to analyze the progression and impacts of existing research, highlight challenges and solutions that reflect dimensions of emerging and futuristic research.
We followed evidence-based software engineering (EBSE) method to conduct a systematic mapping study (SMS) of the existing research that has progressed over more than a decade (25 studies published from 1996–2017).We have derived a taxonomical classification and a holistic mapping of the existing research to investigate its progress, impacts, and potential areas of futuristic research and development.
The SMS has identified three types of migration namely Static, Dynamic, and State-based Migration of existing software systems to mobile computing platforms.Migration to mobile computing platforms enables existing software systems to achieve portability, context-sensitivity, and high connectivity. However, mobile systems may face some challenges such as resource poverty, data security, and privacy. The emerging and futuristic research aims to support patterns and tool support to automate the migration process. The results of this SMS can benefit researchers and practitioners–by highlighting challenges, solutions, and tools, etc., –to conceptualize the state-ofthe- art and futuristic trends that support migration of existing software to mobile computing.
Wireless sensor network (WSN) is effective for monitoring the target environment,which consists of a large number of sensor nodes of limited energy. An efficient medium access control (MAC) protocol is thus imperative to maximize the energy efficiency and performance of WSN. The most existing MAC protocols are based on the scheduling of sleep and active period of the nodes, and do not consider the relationship between the load condition and performance. In this paper a novel scheme is proposed to properly determine the duty cycle of the WSN nodes according to the load,which employs the Q-learning technique and function approximation with linear regression. This allows low-latency energy-efficient scheduling for a wide range of traffic conditions, and effectively overcomes the limitation of Q-learning with the problem of continuous state-action space. NS3 simulation reveals that the proposed scheme significantly improves the throughput, latency, and energy efficiency compared to the existing fully active scheme and S-MAC.
Visual information is highly advantageous for the evolutionary success of almost all animals. This information is likewise critical for many computing tasks, and visual computing has achieved tremendous successes in numerous applications over the last 60 years or so. In that time, the development of visual computing has moved forwards with inspiration from biological mechanisms many times. In particular, deep neural networks were inspired by the hierarchical processing mechanisms that exist in the visual cortex of primate brains (including ours), and have achieved huge breakthroughs in many domainspecific visual tasks. In order to better understand biologically inspired visual computing, we will present a survey of the current work, and hope to offer some new avenues for rethinking visual computing and designing novel neural network architectures.
Multi-label classification aims to assign a set of proper labels for each instance, where distance metric learning can help improve the generalization ability of instance-based multi-label classification models. Existing multi-label metric learning techniques work by utilizing pairwise constraints to enforce that examples with similar label assignments should have close distance in the embedded feature space. In this paper, a novel distance metric learning approach for multi-label classification is proposed by modeling structural interactions between instance space and label space. On one hand, compositional distance metric is employed which adopts the representation of a weighted sum of rank-1 PSD matrices based on component bases. On the other hand, compositional weights are optimized by exploiting triplet similarity constraints derived from both instance and label spaces. Due to the compositional nature of employed distance metric, the resulting problem admits quadratic programming formulation with linear optimization complexity w.r.t. the number of training examples.We also derive the generalization bound for the proposed approach based on algorithmic robustness analysis of the compositional metric. Extensive experiments on sixteen benchmark data sets clearly validate the usefulness of compositional metric in yielding effective distance metric for multi-label classification.
As the first attempt, this paper proposes a model for the Chinese high school timetabling problems (CHSTPs) under the new curriculum innovation which was launched in 2014 by the Chinese government. According to the new curriculum innovation, students in high school can choose subjects that they are interested in instead of being forced to select one of the two study directions, namely, Science and Liberal Arts. Meanwhile, they also need to attend compulsory subjects as traditions. CHSTPs are student-oriented and involve more student constraints that make them more complex than the typical “Class-Teacher model”, in which the element “Teacher” is the primary constraint. In this paper, we first describe in detail the mathematical model of CHSTPs and then design a new two-part representation for the candidate solution. Based on the new representation, we adopt a two-phase simulated annealing (SA) algorithm to solve CHSTPs. A total number of 45 synthetic instances with different amounts of classes, teachers, and levels of student constraints are generated and used to illustrate the characteristics of the CHSTP model and the effectiveness of the designed representation and algorithm. Finally,we apply the proposed model, the designed two-part representation and the two-phase SA on10 real high schools.
Network embedding which aims to embed a given network into a low-dimensional vector space has been proved effective in various network analysis and mining tasks such as node classification, link prediction and network visualization. The emerging network embedding methods have shifted of emphasis in utilizing mature deep learning models. The neuralnetwork based network embedding has become a mainstream solution because of its high efficiency and capability of preserving the nonlinear characteristics of the network. In this paper, we propose Adversarial Network Embedding using Structural Similarity (ANESS), a novel, versatile, low-complexity GANbased network embedding model which utilizes the inherent vertex-to-vertex structural similarity attribute of the network. ANESS learns robustness and effective vertex embeddings via a adversarial training procedure. Specifically, our method aims to exploit the strengths of generative adversarial networks in generating high-quality samples and utilize the structural similarity identity of vertexes to learn the latent representations of a network. Meanwhile, ANESS can dynamically update the strategy of generating samples during each training iteration. The extensive experiments have been conducted on the several benchmark network datasets, and empirical results demonstrate that ANESS significantly outperforms other state-of-theart network embedding methods.
A sememe is defined as the minimum semantic unit of languages in linguistics. Sememe knowledge bases are built by manually annotating sememes for words and phrases. HowNet is the most well-known sememe knowledge base. It has been extensively utilized in many natural language processing tasks in the era of statistical natural language processing and proven to be effective and helpful to understanding and using languages. In the era of deep learning, although data are thought to be of vital importance, there are some studies working on incorporating sememe knowledge bases like HowNet into neural network models to enhance system performance. Some successful attempts have been made in the tasks including word representation learning, language modeling, semantic composition, etc. In addition, considering the high cost of manual annotation and update for sememe knowledge bases, some work has tried to use machine learning methods to automatically predict sememes for words and phrases to expand sememe knowledge bases. Besides, some studies try to extend HowNet to other languages by automatically predicting sememes for words and phrases in a new language. In this paper, we summarize recent studies on application and expansion of sememe knowledge bases and point out some future directions of research on sememes.
Density based clustering algorithms (DBCLAs) rely on the notion of density to identify clusters of arbitrary shapes, sizes with varying densities. Existing surveys on DBCLAs cover only a selected set of algorithms. These surveys fail to provide an extensive information about a variety of DBCLAs proposed till date including a taxonomy of the algorithms. In this paper we present a comprehensive survey of various DBCLAs over last two decades along with their classification. We group the DBCLAs in each of the four categories: density definition, parameter sensitivity, execution mode and nature of data and further divide them into various classes under each of these categories. In addition, we compare the DBCLAs through their common features and variations in citation and conceptual dependencies. We identify various application areas of DBCLAs in domains such as astronomy, earth sciences, molecular biology, geography, multimedia. Our survey also identifies probable future directions of DBCLAs where involvement of density based methods may lead to favorable results.
This paper presents a comprehensive survey on the development of Intel SGX (software guard extensions) processors and its applications. With the advent of SGX in 2013 and its subsequent development, the corresponding research works are also increasing rapidly. In order to get a more comprehensive literature review related to SGX, we have made a systematic analysis of the related papers in this area. We first search through five large-scale paper retrieval libraries by keywords (i.e., ACM Digital Library, IEEE/IET Electronic Library, SpringerLink, Web of Science, and Elsevier Science Direct). We read and analyze a total of 128 SGX-related papers. The first round of extensive study is conducted to classify them. The second round of intensive study is carried out to complete a comprehensive analysis of the paper from various aspects. We start with the working environment of SGX and make a conclusive summary of trusted execution environment (TEE).We then focus on the applications of SGX. We also review and study multifarious attack methods to SGX framework and some recent security improvementsmade on SGX. Finally, we summarize the advantages and disadvantages of SGX with some future research opportunities. We hope this review could help the existing and future research works on SGX and its application for both developers and users.
Entity set expansion (ESE) aims to expand an entity seed set to obtain more entities which have common properties. ESE is important for many applications such as dictionary construction and query suggestion. Traditional ESE methods relied heavily on the text and Web information of entities. Recently, some ESE methods employed knowledge graphs (KGs) to extend entities. However, they failed to effectively and efficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia. In this paper, we model a KG as a heterogeneous information network (HIN) containing multiple types of objects and relations. Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities. Then we rank the entities according to the meta path based structural similarity. Furthermore, to utilize the text description of entities in Wikipedia, we propose an extended model CoMeSE++ which combines both structural information revealed by a KG and text information in Wikipedia for ESE. Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.
Blockchain has recently emerged as a research trend, with potential applications in a broad range of industries and context. One particular successful Blockchain technology is smart contract, which is widely used in commercial settings (e.g., high value financial transactions). This, however, has security implications due to the potential to financially benefit froma security incident (e.g., identification and exploitation of a vulnerability in the smart contract or its implementation). Among, Ethereum is the most active and arresting. Hence, in this paper, we systematically review existing research efforts on Ethereum smart contract security, published between 2015 and 2019. Specifically, we focus on how smart contracts can be maliciously exploited and targeted, such as security issues of contract program model, vulnerabilities in the program and safety consideration introduced by program execution environment. We also identify potential research opportunities and future research agenda.
Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge.However, when we aggregate the crowd knowledge based on the currently developed voting algorithms, it often results in common knowledge that may not be expected. In this paper, we consider the problem of collecting specific knowledge via crowdsourcing. With the help of using external knowledge base such as WordNet, we incorporate the semantic relations between the alternative answers into a probabilisticmodel to determine which answer is more specific. We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption, and solve it by the expectation-maximization (EM) algorithm. To increase algorithm compatibility, we also refine our method into semi-supervised one. Experimental results show that our approach is robust with hyper-parameters and achieves better improvement thanmajority voting and other algorithms when more specific answers are expected, especially for sparse data.
Next location prediction has aroused great interests in the era of internet of things (IoT). With the ubiquitous deployment of sensor devices, e.g., GPS and Wi-Fi, IoT environment offers new opportunities for proactively analyzing human mobility patterns and predicting user’s future visit in low cost, no matter outdoor and indoor. In this paper, we consider the problem of next location prediction in IoT environment via a session-based manner.We suggest that user’s future intention in each session can be better inferred for more accurate prediction if patterns hidden inside both trajectory and signal strength sequences collected from IoT devices can be jointly modeled, which however existing state-of-the-art methods have rarely addressed. To this end, we propose a trajectory and sIgnal sequence (TSIS) model, where the trajectory transition regularities and signal temporal dynamics are jointly embedded in a neural network based model. Specifically, we employ gated recurrent unit (GRU) for capturing the temporal dynamics in the multivariate signal strength sequence. Moreover, we adapt gated graph neural networks (gated GNNs) on location transition graphs to explicitly model the transition patterns of trajectories. Finally, both the low-dimensional representations learned from trajectory and signal sequence are jointly optimized to construct a session embedding, which is further employed to predict the next location. Extensive experiments on two real-world Wi-Fi based mobility datasets demonstrate that TSIS is effective and robust for next location prediction compared with other competitive baselines.
Smart city driven by Big Data and Internet of Things (IoT) has become a most promising trend of the future. As one important function of smart city, event alert based on time series prediction is faced with the challenge of how to extract and represent discriminative features of sensing knowledge from the massive sequential data generated by IoT devices. In this paper, a framework based on sparse representation model (SRM) for time series prediction is proposed as an efficient approach to tackle this challenge. After dividing the over-complete dictionary into upper and lower parts, the main idea of SRMis to obtain the sparse representation of time series based on the upper part firstly, and then realize the prediction of future values based on the lower part. The choice of different dictionaries has a significant impact on the performance of SRM. This paper focuses on the study of dictionary construction strategy and summarizes eight variants of SRM. Experimental results demonstrate that SRM can deal with different types of time series prediction flexibly and effectively.
Quality assessment is a critical component in crowdsourcing-based software engineering (CBSE) as software products are developed by the crowd with unknown or varied skills and motivations. In this paper, we propose a novel metric called the project score to measure the performance of projects and the quality of products for competitionbased software crowdsourcing development (CBSCD) activities. To the best of our knowledge, this is the first work to deal with the quality issue of CBSE in the perspective of projects instead of contests. In particular, we develop a hierarchical quality evaluation framework for CBSCD projects and come up with two metric aggregation models for project scores. The first model is a modified squale model that can locate the software modules of poor quality, and the second one is a clustering-based aggregationmodel, which takes different impacts of phases into account. To test the effectiveness of the proposed metrics, we conduct an empirical study on TopCoder, which is a famous CBSCD platform. Results show that the proposed project score is a strong indicator of the performance and product quality of CBSCD projects.We also find that the clustering-based aggregation model outperforms the Squale one by increasing the percentage of the performance evaluation criterion of aggregation models by an additional 29%. Our approach to quality assessment for CBSCD projects could potentially facilitate software managers to assess the overall quality of a crowdsourced project consisting of programming contests.
Fingerprint matching, spoof mitigation and liveness detection are the trendiest biometric techniques, mostly because of their stability through life, uniqueness and their least risk of invasion. In recent decade, several techniques are presented to address these challenges over well-known data-sets. This study provides a comprehensive review on the fingerprint algorithms and techniques which have been published in the last few decades. It divides the research on fingerprint into nine different approaches including feature based, fuzzy logic, holistic, image enhancement, latent, conventional machine learning, deep learning, template matching and miscellaneous techniques. Among these, deep learning approach has outperformed other approaches and gained significant attention for future research. By reviewing fingerprint literature, it is historically divided into four eras based on 106 referred papers and their cumulative citations.
If an adversary tries to obtain a secret s in a (t, n) threshold secret sharing (SS) scheme, it has to capture no less than t shares instead of the secret s directly. However, if a shareholder keeps a fixed share for a long time, an adversary may have chances to filch some shareholders’ shares. In a proactive secret sharing (PSS) scheme, shareholders are supposed to refresh shares at fixed period without changing the secret. In this way, an adversary can recover the secret if and only if it captures at least t shares during a period rather than any time, and thus PSS provides enhanced protection to long-lived secrets. The existing PSS schemes are almost based on linear SS but no Chinese Remainder Theorem (CRT)-based PSS scheme was proposed. This paper proposes a PSS scheme based on CRT for integer ring to analyze the reason why traditional CRT-based SS is not suitable to design PSS schemes. Then, an ideal PSS scheme based on CRT for polynomial ring is also proposed. The scheme utilizes isomorphism of CRT to implement efficient share refreshing.
Crowdsourcing has become an efficient measure to solve machine-hard problems by embracing group wisdom, in which tasks are disseminated and assigned to a group of workers in the way of open competition. The social relationships formed during this process may in turn contribute to the completion of future tasks. In this sense, it is necessary to take social factors into consideration in the research of crowdsourcing. However, there is little work on the interactions between social relationships and crowdsourcing currently. In this paper, we propose to study such interactions in those social-oriented crowdsourcing systems from the perspective of task assignment. A prototype system is built to help users publish, assign, accept, and accomplish location-based crowdsourcing tasks as well as promoting the development and utilization of social relationships during the crowdsourcing. Especially, in order to exploit the potential relationships between crowdsourcing workers and tasks, we propose a “worker-task” accuracy estimation algorithm based on a graph model that joints the factorized matrixes of both the user social networks and the history “worker-task” matrix. With the worker-task accuracy estimation matrix, a group of optimal worker candidates is efficiently chosen for a task, and a greedy task assignment algorithm is proposed to further the matching of worker-task pairs among multiple crowdsourcing tasks so as to maximize the overall accuracy. Compared with the similarity based task assignment algorithm, experimental results show that the average recommendation success rate increased by 3.67%; the average task completion rate increased by 6.17%; the number of new friends added per week increased from 7.4 to 10.5; and the average task acceptance time decreased by 8.5 seconds.
Age estimation plays an important role in humancomputer interaction system. The lack of large number of facial images with definite age label makes age estimation algorithms inefficient. Deep label distribution learning (DLDL) which employs convolutional neural networks (CNN) and label distribution learning to learn ambiguity from ground-truth age and adjacent ages, has been proven to outperform current state-of-the-art framework. However, DLDL assumes a rough label distribution which covers all ages for any given age label. In this paper, a more practical label distribution paradigm is proposed: we limit age label distribution that only covers a reasonable number of neighboring ages. In addition, we explore different label distributions to improve the performance of the proposed learning model. We employ CNN and the improved label distribution learning to estimate age. Experimental results show that compared to the DLDL, our method is more effective for facial age recognition.
Automotive cyber physical systems (CPSs) are ever more utilizing wireless technology for V2X communication as a potential way out for challenges regarding collision detection, wire strap up troubles and collision avoidance. However, security is constrained as a result of the energy and performance limitations of modern wireless systems. Accordingly, the need for efficient secret key generation and management mechanism for secured communication among computationally weak wireless devices has motivated the introduction of new authentication protocols. Recently, there has been a great interest in physical layer based secret key generation schemes by utilizing channel reciprocity. Consequently, it is observed that the sequence generated by two communicating parties contain mismatched bits which need to be reconciled by exchanging information over a public channel. This can be an immense security threat as it may let an adversary attain and recover segments of the key in known channel conditions. We proposed Hopper-Blum based physical layer (HB-PL) authentication scheme in which an enhanced physical layer key generation method integrates the Hopper-Blum (HB) authentication protocol. The information collected from the shared channel is used as secret keys for the HB protocol and the mismatched bits are used as the induced noise for learning parity with noise (LPN) problem. The proposed scheme aims to provide a way out for bit reconciliation process without leakage of information over a public channel. Moreover, HB protocol is computationally efficient and simple which helps to reduce the number of exchange messages during the authentication process. We have performed several experiments which show that our proposed design can generate secret keys with improved security strength and high performance in comparison to the current authentication techniques. Our scheme requires less than 55 exchange messages to achieve more than 95% of correct authentication.
Regarding extreme value theory, the unseen novel classes in the open-set recognition can be seen as the extreme values of training classes. Following this idea, we introduce the margin and coverage distribution to model the training classes. A novel visual-semantic embedding framework – extreme vocabulary learning (EVoL) is proposed; the EVoL embeds the visual features into semantic space in a probabilistic way. Notably, we adopt the vast open vocabulary in the semantic space to help further constraint the margin and coverage of training classes. The learned embedding can directly be used to solve supervised learning, zero-shot learning, and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed framework against conventional ways.
Manifold regularization (MR) provides a powerful framework for semi-supervised classification using both the labeled and unlabeled data. It constrains that similar instances over the manifold graph should share similar classification outputs according to the manifold assumption. It is easily noted that MR is built on the pairwise smoothness over the manifold graph, i.e., the smoothness constraint is implemented over all instance pairs and actually considers each instance pair as a single operand. However, the smoothness can be pointwise in nature, that is, the smoothness shall inherently occur “everywhere” to relate the behavior of each point or instance to that of its close neighbors. Thus in this paper, we attempt to develop a pointwise MR (PW_MR for short) for semi-supervised learning through constraining on individual local instances. In this way, the pointwise nature of smoothness is preserved, and moreover, by considering individual instances rather than instance pairs, the importance or contribution of individual instances can be introduced. Such importance can be described by the confidence for correct prediction, or the local density, for example. PW_MR provides a different way for implementing manifold smoothness. Finally, empirical results show the competitiveness of PW_MR compared to pairwise MR.