Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and biased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques including sampling and cost sensitive learning are often employed to improve the performance of classifiers in such situations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between themeasure according to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard threelayer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently favorable outcomes in comparison with a commonly used sampling technique. The effectiveness of multi-objective optimization in handling imbalanced problems is also demonstrated.
Working as an ensemble method that establishes a committee of classifiers first and then aggregates their outcomes through majority voting, bagging has attracted considerable research interest and been applied in various application domains. It has demonstrated several advantages, but in its present form, bagging has been found to be less accurate than some other ensemble methods. To unlock its power and expand its user base, we propose an approach that improves bagging through the use of multi-algorithm ensembles. In a multi-algorithm ensemble, multiple classification algorithms are employed. Starting from a study of the nature of diversity, we show that compared to using different training sets alone, using heterogeneous algorithms together with different training sets increases diversity in ensembles, and hence we provide a fundamental explanation for research utilizing heterogeneous algorithms. In addition, we partially address the problem of the relationship between diversity and accuracy by providing a non-linear function that describes the relationship between diversity and correlation. Furthermore, after realizing that the bootstrap procedure is the exclusive source of diversity in bagging, we use heterogeneity as another source of diversity and propose an approach utilizing heterogeneous algorithms in bagging. For evaluation, we consider several benchmark data sets from various application domains. The results indicate that, in terms of F1-measure, our approach outperformsmost of the other state-of-the-art ensemble methods considered in experiments and, in terms of mean margin, our approach is superior to all the others considered in experiments.
To identify recruitment information in different domains, we propose a novel model of hierarchical treestructured conditional random fields (HT-CRFs). In our approach, first, the concept of aWeb object (WOB) is discussed for the description of special Web information. Second, in contrast to traditionalmethods, the Boolean model and multirule are introduced to denote a one-dimensional text feature for a better representation of Web objects. Furthermore, a two-dimensional semantic texture feature is developed to discover the layout of a WOB, which can emphasize the structural attributes and the specific semantics term attributes of WOBs. Third, an optimal WOB information extraction (IE) based on HT-CRF is performed, addressing the problem of a model having an excessive dependence on the page structure and optimizing the efficiency of the model’s training. Finally, we compare the proposed model with existing decoupled approaches forWOB IE. The experimental results show that the accuracy rate of WOB IE is significantly improved and that time complexity is reduced.
A precise method for accurately tracking dimsmall targets, based on spectral fingerprint is proposed where traditional full color tracking seems impossible. A fingerprint model is presented to adequately extract spectral features. By creating a multidimensional feature space and extending the limited RGB information to the hyperspectral information, the improved precise tracking model based on a nonparametric kernel density estimator is built using the probability histogram of spectral features. A layered particle filter algorithm for spectral tracking is presented to avoid the object jumping abruptly. Finally, experiments are conducted that show that the tracking algorithm with spectral fingerprint features is accurate, fast, and robust. It meets the needs of dim-small target tracking adequately.
Automatic object classification in traffic scene videos is an important issue for intelligent visual surveillance with great potential for all kinds of security applications. However, this problem is very challenging for the following reasons. Firstly, regions of interest in videos are of low resolution and limited size due to the capacity of conventional surveillance cameras. Secondly, the intra-class variations are very large due to changes of view angles, lighting conditions, and environments. Thirdly, real-time performance of algorithms is always required for real applications. In this paper, we evaluate the performance of local feature descriptors for automatic object classification in traffic scenes. Image intensity or gradient information is directly used to construct effective feature vectors from regions of interest extracted via motion detection. This strategy has great advantages of efficiency compared to various complicated texture features. We not only analyze and evaluate the performance of different feature descriptors, but also fuse different scales and features to achieve better performance. Numerous experiments are conducted and experimental results demonstrate the efficiency and effectiveness of this strategy with robustness to noise, variance of view angles, lighting conditions, and environments.
Evaluationmeasures play an important role in the design of new approaches, and often quality is measured by assessing the relevance of the obtained result set.While many evaluation measures based on precision/recall are based on a binary relevance model, ranking correlation coefficients are better suited for multi-class problems. State-of-the-art ranking correlation coefficients like Kendall’s
In location-based services, a density query returns the regions with high concentrations of moving objects (MOs). The use of density queries can help users identify crowded regions so as to avoid congestion. Most of the existing methods try very hard to improve the accuracy of query results, but ignore query efficiency.However, response time is also an important concern in query processing and may have an impact on user experience. In order to address this issue, we present a new definition of continuous density queries. Our approach for processing continuous density queries is based on the new notion of a safe interval, using which the states of both dense and sparse regions are dynamically maintained. Two indexing structures are also used to index candidate regions for accelerating query processing and improving the quality of results. The efficiency and accuracy of our approach are shown through an experimental comparison with snapshot density queries.
In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data have kept growing at a rate more than four times of that expected in Moore’s law. This explosion of data comes from various sources such as mobile phones, video cameras and sensor networks, which often present multidimensional characteristics. The huge amount of data brings many challenges on the management, transportation, and processing IT infrastructures. To address these challenges, the state-of-art large scale data center networks have begun to provide cloud services that are increasingly prevalent. However, how to build a good data center remains an open challenge. Concurrently, the architecture design, which significantly affects the total performance, is of great research interest. This paper surveys advances in data center network design. In this paper we first introduce the upcoming trends in the data center industry. Then we review some popular design principles for today’s data center network architectures. In the third part, we present some up-to-date data center frameworks and make a comprehensive comparison of them. During the comparison, we observe that there is no so-called optimal data center and the design should be different referring to the data placement, replication, processing, and query processing. After that, several existing challenges and limitations are discussed. According to these observations, we point out some possible future research directions.
How can we foster and grow artificial societies so as to cause social properties to emerge that are logical, consistent with real societies, and are expected by designers? We propose a framework for fostering artificial societies using social learning mechanisms and social control approaches. We present the application of fostering artificial societies in parallel emergency management systems. Then we discuss social learning mechanisms in artificial societies, including observational learning, reinforcement learning, imitation learning, and advice-based learning. Furthermore, we discuss social control approaches, including social norms, social policies, social reputations, social commitments, and sanctions.
Social networks often serve as a critical medium for information dissemination, diffusion of epidemics, and spread of behavior, by shared activities or similarities between individuals. Recently, we have witnessed an explosion of interest in studying social influence and spread dynamics in social networks. To date, relatively little material has been provided on a comprehensive review in this field. This brief survey addresses this issue.We present the current significant empirical studies on real social systems, including network construction methods, measures of network, and newly empirical results.We then provide a concise description of some related social models from both macro- and micro-level perspectives. Due to the difficulties in combining real data and simulation data for verifying and validating real social systems, we further emphasize the current research results of computational experiments. We hope this paper can provide researchers significant insights into better understanding the characteristics of personal influence and spread patterns in large-scale social systems.
Traditional Chinese medicine (TCM) relies on the combined effects of herbs within prescribed formulae. However, given the combinatorial explosion due to the vast number of herbs available for treatment, the study of these combined effects can become computationally intractable. Thus feature selection has become increasingly crucial as a pre-processing step prior to the study of combined effects in TCM informatics. In accord with this goal, a new feature selection algorithm known as a co-evolving memetic wrapper (COW) is proposed in this paper. COW takes advantage of recent research in genetic algorithms (GAs) and memetic algorithms (MAs) by evolving appropriate feature subsets for a given domain. Our empirical experiments have demonstrated that COW is capable of selecting subsets of herbs from a TCM insomnia dataset that shows signs of combined effects on the prediction of patient outcomes measured in terms of classification accuracy. We compare the proposed algorithm with results from statistical analysis including main effects and up to three way interaction terms and show that COW is capable of correctly identifying the herbs and herb by herb effects that are significantly associated to patient outcome prediction.