Ensemble of multiple kNN classifiers for societal risk classification

Jindong Chen , Xijin Tang

Journal of Systems Science and Systems Engineering ›› 2017, Vol. 26 ›› Issue (4) : 433 -447.

PDF
Journal of Systems Science and Systems Engineering ›› 2017, Vol. 26 ›› Issue (4) : 433 -447. DOI: 10.1007/s11518-017-5346-4
Article

Ensemble of multiple kNN classifiers for societal risk classification

Author information +
History +
PDF

Abstract

Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor (kNN) classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector (kNN-PV) shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification.

Keywords

Societal risk classification / Tianya Forum / k-Nearest Neighbor / ensemble / Paragraph Vector

Cite this article

Download citation ▾
Jindong Chen, Xijin Tang. Ensemble of multiple kNN classifiers for societal risk classification. Journal of Systems Science and Systems Engineering, 2017, 26(4): 433-447 DOI:10.1007/s11518-017-5346-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Bao Y., Ishii N., Du X.. Yang Z.R., Yin H.J., Everson R.M.. Combining multiple k-nearest neighbor classifiers using different distance functions. Proceedings Intelligent Data Engineering and Automated Learning–IDEAL 2004, 2004

[2]

Bay S.D.. Combining nearest neighbor classifiers through multiple feature subsets. Intelligent Data Analysis, 1999, 3(3): 191-209.

[3]

Bengio Y., Ducharme R., Vincent P., Jauvin C.. A neural probabilistic language model. Journal of Machine Learning Research, 2003, 3: 1137-1155.

[4]

Bijalwan V., Kumar V., Kumari P., Pascual J.. KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 2014, 7(1): 61-70.

[5]

Cao L.N., Tang X.J.. Topics and threads of the online public concerns based on Tianya Forum. Journal of Systems Science and Systems Engineering, 2014, 23(2): 212-230.

[6]

Chen J.D., Tang X.J.. Exploring societal risk classification of the posts of Tianya Club. International Journal of Knowledge and Systems Science, 2014, 5(1): 36-48.

[7]

Chen J.D., Tang X.J.. Wang S. Y., Nakamori Y., Huynh V. N.. Societal risk classification of post based on paragraph vector and kNN method. Proceedings of the 15th International Symposium on Knowledge and Systems Sciences, 2014 117-123.

[8]

Chen J.D., Tang X.J.. The distributed representation for societal risk classification toward BBS posts. Journal of Systems Science & Complexity, 2017

[9]

Collobert R., Weston J., Bottou L., Karlen M., Kavukcuoglu K., Kuksa P.. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2461-2505.

[10]

Hirsch L., Hirsch R., Saeedi M.. Evolving Lucene search queries for text classification. Proceedings of 2007 Genetic and Evolutionary Computation Conference, 2007 1604-1611.

[11]

Hu Y., Tang X.J.. Wang M.Z.. Using support vector machine for classification of Baidu hot word. Knowledge Science, Engineering and Management, 2013, Dalian: Springer Berlin Heidelberg 580-590.

[12]

Jeffrey P., Richard S., Christopher M.. Glove: global vectors for wordrepresentation. Proceedings of the Empirical Methods in Natural Language Processing, 2014 1532-1543.

[13]

Le Q., Mikolov T.. Distributed representations of sentences and documents. Computer Science, 2014, 4: 1188-1196.

[14]

Nie D., Guan Z., Hao B., Bai S., Zhu T.S.. Predicting personality on social media with semi-supervised learning. In: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, 2014, 2: 158-165.

[15]

Mikolov T., Chen K., Corrado G., Dean J.. Efficient estimation of word representations in vector space, 2013

[16]

Qiu L., Cao Y., Nie Z.Q., Rui Y.. Learning word representation considering proximity and ambiguity. Proceedings of the 28th AAAI Conference on Artificial Intelligence, 2014 1572-1578.

[17]

Rodriguez, M.G., Gummadi, K. & Schoelkopf, B. (2014). Quantifying information overload in social media and its impact on social contagions. arXiv preprint arXiv:1403.6838.

[18]

Sebastiani F.. Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 2002, 34(1): 1-47.

[19]

Shi W., Wang H.W., He S.Y.. Sentiment analysis of Chinese micro-blogging based on sentiment ontology a case study of ‘7.23 Wenzhou Train Collision’. Connection Science, 2013, 25(4): 161-178.

[20]

Socher R., Perelygin A., Wu J.Y., Chuang J., Manning C.D., Ng A.Y., Potts C.. Recursive deep models for semantic compositionality over a sentiment Treebank. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013 1631-1642.

[21]

Tang X.J.. Exploring online societal risk perception for harmonious society measurement. Journal of Systems Science and Systems Engineering, 2013, 22(4): 469-486.

[22]

Wagner R., Fischer M.. The string-to-string correction problem. Journal of the ACM., 1974, 21(1): 168-178.

[23]

Wen S.Y., Wan X.J.. Emotion classification in Microblog texts using class sequential rules. Proceedings of the 28th AAAI Conference on Artificial Intelligence, 2014 187-193.

[24]

Zhang W., Yoshida T., Tang X.J.. Text classification based on multi-word with support vector machine. Knowledge-Based Systems, 2008, 21(8): 879-886.

[25]

Zhang W., Yoshida T., Tang X.J.. A comparative study of TF* IDF, LSI and multi-words for text classification. Expert Systems with Applications, 2011, 38(3): 2758-2765.

[26]

Zhao Y.L., Tang X.J.. A preliminary research of pattern of users’ behavior based on Tianya Forum. Proceedings of the 14th International Symposium on Knowledge and Systems Sciences, 2013 139-145.

[27]

Zheng R., Shi K., Li S.. Zhou J.. The influence factors and mechanism of societal risk perception. Proceedings of the 1st International Conference on Complex Sciences: Theory and Application, 2009 2266-2275.

AI Summary AI Mindmap
PDF

136

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/