A new feature selection method for handling redundant information in text classification

You-wei WANG , Li-zhou FENG

Front. Inform. Technol. Electron. Eng ›› 2018, Vol. 19 ›› Issue (2) : 221 -234.

PDF (670KB)
Front. Inform. Technol. Electron. Eng ›› 2018, Vol. 19 ›› Issue (2) : 221 -234. DOI: 10.1631/FITEE.1601761
Orginal Article
Orginal Article

A new feature selection method for handling redundant information in text classification

Author information +
History +
PDF (670KB)

Abstract

Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant information, we propose a new simple feature selection method, which can effectively filter the redundant features. First, to calculate the relationship between two words, the definitions of word frequency based relevance and correlative redundancy are introduced. Furthermore, an optimal feature selection (OFS) method is chosen to obtain a feature subset FS1. Finally, to improve the execution speed, the redundant features in FS1 are filtered by combining a predetermined threshold, and the filtered features are memorized in the linked lists. Experiments are carried out on three datasets (WebKB, 20-Newsgroups, and Reuters-21578) where in support vector machines and naïve Bayes are used. The results show that the classification accuracy of the proposed method is generally higher than that of typical tradi-tional methods (information gain, improved Gini index, and improved comprehensively measured feature selection) and the OFS methods. Moreover, the proposed method runs faster than typical mutual information-based methods (improved and normalized mutual information-based feature selections, and multilabel feature selection based on maximum dependency and minimum redundancy) while simultaneously ensuring classification accuracy. Statistical results validate the effectiveness of the proposed method in handling redundant information in text classification.

Keywords

Feature selection / Dimensionality reduction / Text classification / Redundant features / Support vector machine / Naïve Bayes / Mutual information

Cite this article

Download citation ▾
You-wei WANG, Li-zhou FENG. A new feature selection method for handling redundant information in text classification. Front. Inform. Technol. Electron. Eng, 2018, 19(2): 221-234 DOI:10.1631/FITEE.1601761

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

AI Summary AI Mindmap
PDF (670KB)

Supplementary files

FITEE-0221-18006-YWW_suppl_1

FITEE-0221-18006-YWW_suppl_2

2444

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/