In this paper, we propose a more efficient Bayesian network structure learning algorithm under the framework of score based local learning (SLL). Our algorithm significantly improves computational efficiency by restricting the neighbors of each variable to a small subset of candidates and storing necessary information to uncover the spouses, at the same time guaranteeing to find the optimal neighbor set in the same sense as SLL. The algorithm is theoretically sound in the sense that it is optimal in the limit of large sample size. Empirical results testify its improved speed without loss of quality in the learned structures.
In today’s world of excessive development in technologies, sustainability and adaptability of computer applications is a challenge, and future prediction became significant. Therefore, strong artificial intelligence (AI) became important and, thus, statistical machine learning (ML) methods were applied to serve it. These methods are very difficult to understand, and they predict the future without showing how. However, understanding of how machines make their decision is also important, especially in information system domain. Consequently, incremental covering algorithms (CA) can be used to produce simple rules to make difficult decisions. Nevertheless, even though using simple CA as the base of strong AI agent would be a novel idea but doing so with the methods available in CA is not possible. It was found that having to accurately update the discovered rules based on new information in CA is a challenge and needs extra attention. In specific, incomplete data with missing classes is inappropriately considered, whereby the speed and data size was also a concern, and future none existing classes were neglected. Consequently, this paper will introduce a novel algorithm called RULES-IT, in order to solve the problems of incremental CA and introduce it into strong AI. This algorithm is the first incremental algorithm in its family, and CA as a whole, that transfer rules of different domains to improve the performance, generalize the induction, take advantage of past experience in different domain, and make the learner more intelligent. It is also the first to introduce intelligent aspects into incremental CA, including consciousness, subjective emotions, awareness, and adjustment. Furthermore, all decisions made can be understood due to the simple representation of repository as rules. Finally, RULES-IT performance will be benchmarked with six different methods and compared with its predecessors to see the effect of transferring rules in the learning process, and to prove how RULES-IT actually solved the shortcoming of current incremental CA in addition to its improvement in the total performance.
In many data stream mining applications, traditional density estimation methods such as kernel density estimation, reduced set density estimation can not be applied to the density estimation of data streams because of their high computational burden, processing time and intensive memory allocation requirement. In order to reduce the time and space complexity, a novel density estimation method Dm-KDE over data streams based on the proposed algorithm m-KDE which can be used to design a KDE estimator with the fixed number of kernel components for a dataset is proposed. In this method, Dm-KDE sequence entries are created by algorithm m-KDE instead of all kernels obtained from other density estimation methods. In order to further reduce the storage space, Dm-KDE sequence entries can be merged by calculating their KL divergences. Finally, the probability density functions over arbitrary time or entire time can be estimated through the obtained estimation model. In contrast to the state-of-the-art algorithm SOMKE, the distinctive advantage of the proposed algorithm Dm-KDE exists in that it can achieve the same accuracy with much less fixed number of kernel components such that it is suitable for the scenarios where higher on-line computation about the kernel density estimation over data streams is required.We compare Dm-KDE with SOMKE and M-kernel in terms of density estimation accuracy and running time for various stationary datasets. We also apply Dm-KDE to evolving data streams. Experimental results illustrate the effectiveness of the proposed method.
Topical Web crawling is an established technique for domain-specific information retrieval. However, almost all the conventional topical Web crawlers focus on building crawlers using different classifiers, which needs a lot of labeled training data that is very difficult to label manually. This paper presents a novel approach called clustering-based topical Web crawling which is utilized to retrieve information on a specific domain based on link-context and does not require any labeled training data. In order to collect domain-specific content units, a novel hierarchical clustering method called bottom-up approach is used to illustrate the process of clustering where a new data structure, a linked list in combination with CFu-tree, is implemented to store cluster label, feature vector and content unit. During clustering, four metrics are presented. First, comparison variation (CV) is defined to judge whether the closest pair of clusters can be merged. Second, cluster impurity (CIP) evaluates the cluster error. Then, the precision and recall of clustering are also presented to evaluate the accuracy and comprehensive degree of the whole clustering process. Link-context extraction technique is used to expand the feature vector of anchor text which improves the clustering accuracy greatly. Experimental results show that the performance of our proposed method overcomes conventional focused Web crawlers both in Harvest rate and Target recall.
Market making (MM) strategies have played an important role in the electronic stock market. However, the MM strategies without any forecasting power are not safe while trading. In this paper, we design and implement a two-tier framework, which includes a trading signal generator based on a supervised learning approach and an event-driven MM strategy. The proposed generator incorporates the information within order book microstructure and market news to provide directional predictions. The MM strategy in the second tier trades on the signals and prevents itself from profit loss led by market trending. Using half a year price tick data from Tokyo Stock Exchange (TSE) and Shanghai Stock Exchange (SSE), and corresponding Thomson Reuters news of the same time period, we conduct the back-testing and simulation on an industrial near-to-reality simulator. From the empirical results, we find that 1) strategies with signals perform better than strategies without any signal in terms of average daily profit and loss (PnL) and sharpe ratio (SR), and 2) correct predictions do help MM strategies readjust their quoting along with market trending, which avoids the strategies triggering stop loss procedure that further realizes the paper loss.
Facial expression and emotion recognition from thermal infrared images has attracted more and more attentions in recent years. However, the features adopted in current work are either temperature statistical parameters extracted from the facial regions of interest or several hand-crafted features that are commonly used in visible spectrum. Till now there are no image features specially designed for thermal infrared images. In this paper, we propose using the deep Boltzmann machine to learn thermal features for emotion recognition from thermal infrared facial images. First, the face is located and normalized from the thermal infrared images. Then, a deep Boltzmann machine model composed of two layers is trained. The parameters of the deep Boltzmann machine model are further fine-tuned for emotion recognition after pre-training of feature learning. Comparative experimental results on the NVIE database demonstrate that our approach outperforms other approaches using temperature statistic features or hand-crafted features borrowed from visible domain. The learned features from the forehead, eye, and mouth are more effective for discriminating valence dimension of emotion than other facial areas. In addition, our study shows that adding unlabeled data from other database during training can also improve feature learning performance.
Road traffic density has always been a concern in large cities around the world, and many approaches were developed to assist in solving congestions related to slow traffic flow. This work proposes a congestion rate estimation approach that relies on real-time video scenes of road traffic, and was implemented and evaluated on eight different hotspots covering 33 different urban roads. The approach relies on road scene morphology for estimation of vehicles average speed along with measuring the overall video scenes randomness acting as a frame texture analysis indicator. Experimental results shows the feasibility of the proposed approach in reliably estimating traffic density and in providing an early warning to drivers on road conditions, thereby mitigating the negative effect of slow traffic flow on their daily lives.
Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER.
In this paper, a new preference multi-objective optimization algorithm called immune clone algorithm based on reference direction method (RD-ICA) is proposed for solving many-objective optimization problems. First, an intelligent recombination operator, which performs well on the functions comprising many parameters, is introduced into an immune clone algorithm so as to explore the potentially excellent gene segments of all individuals in the antibody population. Second, a reference direction method, a very strict ranking based on the desire of decision makers (DMs), is used to guide selection and clone of the active population. Then a light beam search (LBS) is borrowed to pick out a small set of individuals filling the external population. The proposed method has been extensively compared with other recently proposed evolutionary multi-objective optimization (EMO) approaches over DTLZ problems with from 4 to 100 objectives. Experimental results indicate RD-ICA can achieve competitive results.
As an improtant cryptographic scheme, signcryption scheme has been widely used in applications since it could provide both of signature and encryption. With the development of the certificateless public key cryptography (CLPKC), many certificatelss signcryption (CLSC) schemes using bilinear pairing hve been proposed. Comparated other operations, the bilinear pairing operaion is much more compulicated. Therefore, CLSC scheme without bilinear pairing is more suitable for applications. Recently, Jing et al. proposed a CLSC scheme without bilinear pairing and claimed their scheme is secure against two types of adversaries. In this paper, we will show their scheme provide neither unforgeability property nor confidentiality property. To improve security, we also propose a new CLSC scheme without pairing and demonstrate it is provably secure in the random oracle model.
In order to achieve secure signcryption schemes in the quantum era, Li Fagen et al. [Concurrency and Computation: Practice and Experience, 2012, 25(4): 2112–2122] and Wang Fenghe et al. [Applied Mathematics & Information Sciences, 2012, 6(1): 23–28] have independently extended the concept of signcryption to lattice-based cryptography. However, their schemes are only secure under the random oracle model. In this paper, we present a lattice-based signcryption scheme which is secure under the standard model. We prove that our scheme achieves indistinguishability against adaptive chosen-ciphertext attacks (IND-CCA2) under the learning with errors (LWE) assumption and existential unforgeability against adaptive chosen-message attacks (EUFCMA) under the small integer solution (SIS) assumption.
We systematically investigate minimum capacity unbalanced cut problems arising in social networks. Let k be an input parameter. A cut (A, B) is unbalanced if the size of its smaller side is at most k (called k-size) or exactly k (called Ek-size). An s-t cut (A, B) is unbalanced if its s-side is either k-size or Ek-size. In the min k-size cut (s-t cut, resp.) problem, we want to find a k-size cut (s-t cut, resp.) with the minimum capacity. The corresponding min Ek-size cut (and s-t cut) problem is defined in a similar way. While the classical min s-t cut problem has been studied extensively, the minimum capacity unbalanced cut problem has only recently attracted the attention of researchers. In this paper, we prove that the min k-size s-t cut problem is NP-hard, and give O(log n)-approximation algorithms for the min k-size s-t cut problem, the min Ek-size s-t cut problem, and the min Eksize cut problem. These results, together with previous results, complete our research into minimum capacity unbalanced cut problems.
Logic Petri nets (LPNs) are suitable to describe and analyze batch processing functions and passing value indeterminacy in cooperative systems. To investigate the dynamic properties of LPNs directly, a new method for analyzing LPNs is proposed based on marking reachability graphs in this paper. Enabled conditions of transitions are obtained and a marking reachability graph is constructed. All reachable markings can be obtained based on the graph; the fairness and reversibility of LPNs are analyzed. Moreover, the computing complexity of the enabled conditions and reachable markings can be reduced by this method. The advantages of the proposed method are illustrated by examples and analysis.
Complicated global climate problems trigger researchers from different scientific disciplines to link multiphysics simulations called models for integrated modeling of climate changes by using a software framework called earth system modeling (ESM). As its critical component, coupler is in charge of connections and interactions among models. With the advance of next-generation models, greater data transfer volume and higher coupling frequency are expected to put heavy performance burden on coupler. High efficient coupling techniques are required. In this paper, we propose the sub-domain mapping method to improve the parallel coupling consisted of data transfer and data transformation. By using one specific interpolation oriented communication routing, the communication operations that are originally decentralized in various steps can be combined together for execution. This can reduce the redundant communications and the entailed synchronization costs. The tests on the Tianhe-1A (TH-1A) supercomputer show that our method can achieve 1.1 to 4.9 fold performance improvements. We also present further optimization solution for the multi-interpolation cases. The test results show that our method can achieve up to 3.4 fold speedup over the original coupling execution of the current climate system.