With the development of Internet, people are more likely to post and propagate opinions online. Sentiment analysis is then becoming an important challenge to understand the polarity beneath these comments. Currently a lot of approaches from natural language processing’s perspective have been employed to conduct this task. The widely used ones include bag-of-words and semantic oriented analysis methods. In this research, we further investigate the structural information among words, phrases and sentences within the comments to conduct the sentiment analysis. The idea is inspired by the fact that the structural information is playing important role in identifying the overall statement’s polarity. As a result a novel sentiment analysis model is proposed based on recurrent neural network, which takes the partial document as input and then the next parts to predict the sentiment label distribution rather than the next word. The proposed method learns words representation simultaneously the sentiment distribution. Experimental studies have been conducted on commonly used datasets and the results have shown its promising potential.
For many supervised learning applications, additional information, besides the labels, is often available during training, but not available during testing. Such additional information, referred to the privileged information, can be exploited during training to construct a better classifier. In this paper, we propose a Bayesian network (BN) approach for learning with privileged information. We propose to incorporate the privileged information through a three-node BN. We further mathematically evaluate different topologies of the three-node BN and identify those structures, through which the privileged information can benefit the classification. Experimental results on handwritten digit recognition, spontaneous versus posed expression recognition, and gender recognition demonstrate the effectiveness of our approach.
Most of our learning comes from other people or from our own experience. For instance, when a taxi driver is seeking passengers on an unknown road in a large city, what should the driver do? Alternatives include cruising around the road or waiting for a time period at the roadside in the hopes of finding a passenger or just leaving for another road enroute to a destination he knows (e.g., hotel taxi rank)? This is an interesting problem that arises everyday in cities all over the world. There could be different answers to the question poised above, but one fundamental problem is how the driver learns about the likelihood of finding passengers on a road that is new to him (as he has not picked up or dropped off passengers there before). Our observation from large scale taxi driver trace data is that a driver not only learns from his own experience but through interactions with other drivers. In this paper, we first formally define this problem as socialized information learning (SIL), second we propose a framework including a series of models to study how a taxi driver gathers and learns information in an uncertain environment through the use of his social network. Finally, the large scale real life data and empirical experiments confirm that our models are much more effective, efficient and scalable that prior work on this problem.
The rapidly increasing scale of data warehouses is challenging today’s data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema — it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users’ demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries.
In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.
Recently XML has become a standard for data representation and the preferred method of encoding structured data for exchange over the Internet. Moreover it is frequently used as a logical format to store structured and semi-structured data in databases. We propose a model-driven and configurable approach for modeling hierarchical XML data using object role modeling (ORM) as a flat conceptual model. First a non-hierarchical conceptual schema of the problem domain is built using ORM and then different hierarchical views of the conceptual schema or parts of it are specified by the designer using transformation rules. A hierarchical modeling notation called H-ORM is proposed to show these hierarchical views and model more complex semi-structured data constructs and constraints. We also propose an algorithm to map hierarchical H-ORM views to XML schema language.
In this paper, we focus on efficient construction of restricted subtree (RSubtree) results for XML keyword queries on a multicore system. We firstly show that the performance bottlenecks for existing methods lie in 1) computing the set of relevant keyword nodes(RKNs) for each subtree root node, 2) constructing the corresponding RSubtree, and 3) parallel execution. We then propose a two-step generic top-down subtree construction algorithm, which computes SLCA/ELCA nodes in the first step, and parallelly gets RKNs and generates RSubtree results in the second step, where generic means that 1) our method can be used to compute different kinds of subtree results, 2) our method is independent of the query semantics; top-down means that our method constructs each RSubtree by visiting nodes of the subtree constructed based on an RKN set level-by-level from left to right, such that to avoid visiting as many useless nodes as possible. The experimental results show that our method is much more efficient than existing ones according to various metrics.
Open and dynamic environments lead to inherent uncertainty of Web service QoS (Quality of Service), and the QoS-aware service selection problem can be looked upon as a decision problem under uncertainty. We use an empirical distribution function to describe the uncertainty of scores obtained from historical transactions. We then propose an approach to discovering the admissible set of services including alternative services that are not dominated by any other alternatives according to the expected utility criterion. Stochastic dominance (SD) rules are used to compare two services with uncertain scores regardless of the distribution form of their uncertain scores. By using the properties of SD rules, an algorithm is developed to reduce the number of SD tests, by which the admissible services can be reported progressively. We prove that the proposed algorithm can be run on partitioned or incremental alternative services. Moreover, we achieve some useful theoretical conclusions for correct pruning of unnecessary calculations and comparisons in each SD test, by which the efficiency of the SD tests can be improved. We make a comprehensive experimental study using real datasets to evaluate the effectiveness, efficiency, and scalability of the proposed algorithm.
Mobile ad-hoc networks (MANETs) and wireless sensor networks (WSNs) have gained remarkable appreciation and technological development over the last few years. Despite ease of deployment, tremendous applications and significant advantages, security has always been a challenging issue due to the nature of environments in which nodes operate. Nodes’ physical capture, malicious or selfish behavior cannot be detected by traditional security schemes. Trust and reputation based approaches have gained global recognition in providing additional means of security for decision making in sensor and ad-hoc networks. This paper provides an extensive literature review of trust and reputation based models both in sensor and ad-hoc networks. Based on the mechanism of trust establishment, we categorize the stateof-the-art into two groups namely node-centric trust models and system-centric trust models. Based on trust evidence, initialization, computation, propagation and weight assignments, we evaluate the efficacy of the existing schemes. Finally, we conclude our discussion with identification of some unresolved issues in pursuit of trust and reputation management.
Despite the various attractive features that Cloud has to offer, the rate of Cloud migration is rather slow, primarily due to the serious security and privacy issues that exist in the paradigm. One of the main problems in this regard is that of authorization in the Cloud environment, which is the focus of our research. In this paper, we present a systematic analysis of the existing authorization solutions in Cloud and evaluate their effectiveness against well-established industrial standards that conform to the unique access control requirements in the domain. Our analysis can benefit organizations by helping them decide the best authorization technique for deployment in Cloud; a case study along with simulation results is also presented to illustrate the procedure of using our qualitative analysis for the selection of an appropriate technique, as per Cloud consumer requirements. From the results of this evaluation, we derive the general shortcomings of the extant access control techniques that are keeping them from providing successful authorization and, therefore, widely adopted by the Cloud community. To that end, we enumerate the features an ideal access control mechanisms for the Cloud should have, and combine them to suggest the ultimate solution to this major security challenge – access control as a service (ACaaS) for the software as a service (SaaS) layer. We conclude that a meticulous research is needed to incorporate the identified authorization features into a generic AcaaS framework that should be adequate for providing high level of extensibility and security by integrating multiple access control models.
Dynamic consolidation of virtual machines (VMs) in a data center is an effective way to reduce the energy consumption and improve physical resource utilization. Determining which VMs should be migrated from an overloaded host directly influences the VM migration time and increases energy consumption for the whole data center, and can cause the service level of agreement (SLA), delivered by providers and users, to be violated. So when designing a VM selection policy, we not only consider CPU utilization, but also define a variable that represents the degree of resource satisfaction to select the VMs. In addition, we propose a novel VM placement policy that prefers placing a migratable VM on a host that has the minimum correlation coefficient. The bigger correlation coefficient a host has, the greater the influence will be on VMs located on that host after the migration. Using CloudSim, we run simulations whose results let draw us to conclude that the policies we propose in this paper perform better than existing policies in terms of energy consumption, VM migration time, and SLA violation percentage.