Frontiers of Computer Science

Oct 2012, Volume 6 Issue 5

Select all

RESEARCH ARTICLE

Measure oriented training: a targeted approach to imbalanced classification problems

Bo YUAN, Wenhuang LIU

2012, 6(5): 489-497. https://doi.org/10.1007/s11704-012-2943-8

Download PDF

Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and biased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques including sampling and cost sensitive learning are often employed to improve the performance of classifiers in such situations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between themeasure according to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard threelayer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently favorable outcomes in comparison with a commonly used sampling technique. The effectiveness of multi-objective optimization in handling imbalanced problems is also demonstrated.
RESEARCH ARTICLE

Improving bagging performance through multi-algorithm ensembles

Kuo-Wei HSU, Jaideep SRIVASTAVA

2012, 6(5): 498-512. https://doi.org/10.1007/s11704-012-1163-6

Download PDF

Working as an ensemble method that establishes a committee of classifiers first and then aggregates their outcomes through majority voting, bagging has attracted considerable research interest and been applied in various application domains. It has demonstrated several advantages, but in its present form, bagging has been found to be less accurate than some other ensemble methods. To unlock its power and expand its user base, we propose an approach that improves bagging through the use of multi-algorithm ensembles. In a multi-algorithm ensemble, multiple classification algorithms are employed. Starting from a study of the nature of diversity, we show that compared to using different training sets alone, using heterogeneous algorithms together with different training sets increases diversity in ensembles, and hence we provide a fundamental explanation for research utilizing heterogeneous algorithms. In addition, we partially address the problem of the relationship between diversity and accuracy by providing a non-linear function that describes the relationship between diversity and correlation. Furthermore, after realizing that the bootstrap procedure is the exclusive source of diversity in bagging, we use heterogeneity as another source of diversity and propose an approach utilizing heterogeneous algorithms in bagging. For evaluation, we consider several benchmark data sets from various application domains. The results indicate that, in terms of F1-measure, our approach outperformsmost of the other state-of-the-art ensemble methods considered in experiments and, in terms of mean margin, our approach is superior to all the others considered in experiments.
RESEARCH ARTICLE

A probabilistic model with multi-dimensional features for object extraction

Jing WANG, Zhijing LIU, Hui ZHAO

2012, 6(5): 513-526. https://doi.org/10.1007/s11704-012-1093-3

Download PDF

To identify recruitment information in different domains, we propose a novel model of hierarchical treestructured conditional random fields (HT-CRFs). In our approach, first, the concept of aWeb object (WOB) is discussed for the description of special Web information. Second, in contrast to traditionalmethods, the Boolean model and multirule are introduced to denote a one-dimensional text feature for a better representation of Web objects. Furthermore, a two-dimensional semantic texture feature is developed to discover the layout of a WOB, which can emphasize the structural attributes and the specific semantics term attributes of WOBs. Third, an optimal WOB information extraction (IE) based on HT-CRF is performed, addressing the problem of a model having an excessive dependence on the page structure and optimizing the efficiency of the model’s training. Finally, we compare the proposed model with existing decoupled approaches forWOB IE. The experimental results show that the accuracy rate of WOB IE is significantly improved and that time complexity is reduced.
RESEARCH ARTICLE

A precise approach to tracking dim-small targets using spectral fingerprint features

Hao SHENG, Chao LI, Yuanxin OUYANG, Zhang XIONG

2012, 6(5): 527-536. https://doi.org/10.1007/s11704-012-1106-2

Download PDF

A precise method for accurately tracking dimsmall targets, based on spectral fingerprint is proposed where traditional full color tracking seems impossible. A fingerprint model is presented to adequately extract spectral features. By creating a multidimensional feature space and extending the limited RGB information to the hyperspectral information, the improved precise tracking model based on a nonparametric kernel density estimator is built using the probability histogram of spectral features. A layered particle filter algorithm for spectral tracking is presented to avoid the object jumping abruptly. Finally, experiments are conducted that show that the tracking algorithm with spectral fingerprint features is accurate, fast, and robust. It meets the needs of dim-small target tracking adequately.
RESEARCH ARTICLE

Automatic object classification using motion blob based local feature fusion for traffic scene surveillance

Zhaoxiang ZHANG, Yunhong WANG

2012, 6(5): 537-546. https://doi.org/10.1007/s11704-012-1296-7

Download PDF

Automatic object classification in traffic scene videos is an important issue for intelligent visual surveillance with great potential for all kinds of security applications. However, this problem is very challenging for the following reasons. Firstly, regions of interest in videos are of low resolution and limited size due to the capacity of conventional surveillance cameras. Secondly, the intra-class variations are very large due to changes of view angles, lighting conditions, and environments. Thirdly, real-time performance of algorithms is always required for real applications. In this paper, we evaluate the performance of local feature descriptors for automatic object classification in traffic scenes. Image intensity or gradient information is directly used to construct effective feature vectors from regions of interest extracted via motion detection. This strategy has great advantages of efficiency compared to various complicated texture features. We not only analyze and evaluate the performance of different feature descriptors, but also fuse different scales and features to achieve better performance. Numerous experiments are conducted and experimental results demonstrate the efficiency and effectiveness of this strategy with robustness to noise, variance of view angles, lighting conditions, and environments.
RESEARCH ARTICLE

The ClasSi coefficient for the evaluation of ranking quality in the presence of class similarities

Anca Maria IVANESCU, Marc WICHTERICH, Christian BEECKS, Thomas SEIDL

2012, 6(5): 568-580. https://doi.org/10.1007/s11704-012-1175-2

Download PDF

Evaluationmeasures play an important role in the design of new approaches, and often quality is measured by assessing the relevance of the obtained result set.While many evaluation measures based on precision/recall are based on a binary relevance model, ranking correlation coefficients are better suited for multi-class problems. State-of-the-art ranking correlation coefficients like Kendall’s τ and Spearman’s ρ do not allow the user to specify similarities between differing object classes and thus treat the transposition of objects from similar classes the same way as that of objects from dissimilar classes. We propose ClasSi, a new ranking correlation coefficient which deals with class label rankings and employs a class distance function to model the similarities between the classes. We also introduce a graphical representation of ClasSi which describes how the correlation evolves throughout the ranking.
RESEARCH ARTICLE

An efficient approach for continuous density queries

Jie WEN, Xiaofeng MENG, Xing HAO, Jianliang XU

2012, 6(5): 581-595. https://doi.org/10.1007/s11704-012-1120-4

Download PDF

In location-based services, a density query returns the regions with high concentrations of moving objects (MOs). The use of density queries can help users identify crowded regions so as to avoid congestion. Most of the existing methods try very hard to improve the accuracy of query results, but ignore query efficiency.However, response time is also an important concern in query processing and may have an impact on user experience. In order to address this issue, we present a new definition of continuous density queries. Our approach for processing continuous density queries is based on the new notion of a safe interval, using which the states of both dense and sparse regions are dynamically maintained. Two indexing structures are also used to index candidate regions for accelerating query processing and improving the quality of results. The efficiency and accuracy of our approach are shown through an experimental comparison with snapshot density queries.
REVIEW AETICLE

Rethinking the architecture design of data center networks

Kaishun WU, Jiang XIAO, Lionel M. NI

2012, 6(5): 596-603. https://doi.org/10.1007/s11704-012-1155-6

Download PDF

In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data have kept growing at a rate more than four times of that expected in Moore’s law. This explosion of data comes from various sources such as mobile phones, video cameras and sensor networks, which often present multidimensional characteristics. The huge amount of data brings many challenges on the management, transportation, and processing IT infrastructures. To address these challenges, the state-of-art large scale data center networks have begun to provide cloud services that are increasingly prevalent. However, how to build a good data center remains an open challenge. Concurrently, the architecture design, which significantly affects the total performance, is of great research interest. This paper surveys advances in data center network design. In this paper we first introduce the upcoming trends in the data center industry. Then we review some popular design principles for today’s data center network architectures. In the third part, we present some up-to-date data center frameworks and make a comprehensive comparison of them. During the comparison, we observe that there is no so-called optimal data center and the design should be different referring to the data placement, replication, processing, and query processing. After that, several existing challenges and limitations are discussed. According to these observations, we point out some possible future research directions.
RESEARCH ARTICLE

Fostering artificial societies using social learning and social control in parallel emergency management systems

Wei DUAN, Xiaogang QIU

2012, 6(5): 604-610. https://doi.org/10.1007/s11704-012-1166-3

Download PDF

How can we foster and grow artificial societies so as to cause social properties to emerge that are logical, consistent with real societies, and are expected by designers? We propose a framework for fostering artificial societies using social learning mechanisms and social control approaches. We present the application of fostering artificial societies in parallel emergency management systems. Then we discuss social learning mechanisms in artificial societies, including observational learning, reinforcement learning, imitation learning, and advice-based learning. Furthermore, we discuss social control approaches, including social norms, social policies, social reputations, social commitments, and sanctions.
REVIEW AETICLE

Social influence and spread dynamics in social networks

Xiaolong ZHENG, Yongguang ZHONG, Daniel ZENG, Fei-Yue WANG

2012, 6(5): 611-620. https://doi.org/10.1007/s11704-012-1176-1

Download PDF

Social networks often serve as a critical medium for information dissemination, diffusion of epidemics, and spread of behavior, by shared activities or similarities between individuals. Recently, we have witnessed an explosion of interest in studying social influence and spread dynamics in social networks. To date, relatively little material has been provided on a comprehensive review in this field. This brief survey addresses this issue.We present the current significant empirical studies on real social systems, including network construction methods, measures of network, and newly empirical results.We then provide a concise description of some related social models from both macro- and micro-level perspectives. Due to the difficulties in combining real data and simulation data for verifying and validating real social systems, we further emphasize the current research results of computational experiments. We hope this paper can provide researchers significant insights into better understanding the characteristics of personal influence and spread patterns in large-scale social systems.
RESEARCH ARTICLE

A co-evolving memetic wrapper for prediction of patient outcomes in TCM informatics

Dion DETTERER, Paul KWAN, Cedric GONDRO

2012, 6(5): 621-629. https://doi.org/10.1007/s11704-012-2959-0

Download PDF

Traditional Chinese medicine (TCM) relies on the combined effects of herbs within prescribed formulae. However, given the combinatorial explosion due to the vast number of herbs available for treatment, the study of these combined effects can become computationally intractable. Thus feature selection has become increasingly crucial as a pre-processing step prior to the study of combined effects in TCM informatics. In accord with this goal, a new feature selection algorithm known as a co-evolving memetic wrapper (COW) is proposed in this paper. COW takes advantage of recent research in genetic algorithms (GAs) and memetic algorithms (MAs) by evolving appropriate feature subsets for a given domain. Our empirical experiments have demonstrated that COW is capable of selecting subsets of herbs from a TCM insomnia dataset that shows signs of combined effects on the prediction of patient outcomes measured in terms of classification accuracy. We compare the proposed algorithm with results from statistical analysis including main effects and up to three way interaction terms and show that COW is capable of correctly identifying the herbs and herb by herb effects that are significantly associated to patient outcome prediction.

Please choose a citation manager

About the journal

Aims & scope

Description

Editorial board

Abstracting / Indexing

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Oct 2012, Volume 6 Issue 5