A multi-level approach to highly efficient recognition of Chinese spam short messages
Weimin WANG, Dan ZHOU
A multi-level approach to highly efficient recognition of Chinese spam short messages
The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. Themethod can learnmany interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.
spam short message / spam recognition / similarity computing / pattern learning
[1] |
Chen Y W. The research of treatment for spam message in China. Dissertation for the Doctoral Degree. Shanghai: Shanghai Jiao Tong University, 2010
|
[2] |
Huang L Y. On the countermeasures of junk message. Journal of Chongqing University of Posts and Telecommunications (Social Science Edition), 2010, 3: 25–30
|
[3] |
Jia X Z. A study on legal governance of spam messages in China. Dissertation for the Doctoral Degree. Changchun: Jilin University, 2013
|
[4] |
Yi Y F. Principles and implementation of spam short message monitoring. Zhongxing Telecom Technology, 2005, 11(6): 49–54
|
[5] |
Zhang Y, Fu J M. Identifying and trace backing short message spam. Application Research of Computers, 2006, 23(3): 245–247
|
[6] |
Wang B, Pan W F. A survey of content-based anti-spam email filtering. Journal of Chinese Information Processing, 2006, 19(5): 1–10
|
[7] |
Shan G Y, Fan X H, Yang Y X. Short message service system security analysis. Information Network Security, 2003, 11: 52–54
|
[8] |
Shi J. An effective spam short message filtering system. Dissertation for the Doctoral Degree. Chengdu: University of Electronic Science and Technology of China, 2010
|
[9] |
Wang R, Tan W. Management of spam SMS based on big data mining. Telecom Engineering Technics and Standardization, 2015, 2: 78–82
|
[10] |
Qian Q, Wan B. Spam messages intercept strategy research based on the generalized digit. China New Communication, 2015, 4: 42–43
|
[11] |
Zhang Y J, Liu J L, Gao S B. Spam short message classifier model based on association rules. Journal of Nantong University (Natural Science Edition), 2014, 3: 6–12
|
[12] |
Sun D. Application and implementation of Hadoop cloud computing technology in junk message filtering. Netinfo Security, 2015, 7: 13–19
|
[13] |
Uysal A K, Gunal S, Ergin S, Gunal E S. A novel framework for SMS spam filtering. In: Proceedings of 2012 International Symposium on Innovations in Intelligent Systems and Applications (INISTA). 2012
CrossRef
Google scholar
|
[14] |
Duan L Z, Li N, Huang L J. A new spam short message classification. In: Proceedings of the 1st International Workshop on Education Technology and Computer Science. 2009
CrossRef
Google scholar
|
[15] |
Rafique M Z, Farooq M. SMS SPAM detection by operating on bytelevel distributions using hidden markov models. In: Proceedings of the 20th Virus Bulletin International Conference. 2010
|
[16] |
Chen K X, Chen J Y. An improved spam short message filtering technology based on the naive Bayesian algorithm. Fujian Computer, 2014, 3: 42–43
|
[17] |
Wu N N, Wu M G, Chen S. Real-time monitoring and filtering system for mobile SMS. In: Proceedings of the 3rd IEEE Conference on Industrial Electronics and Applications. 2008
|
[18] |
Ma N. Research on content based spam short message identifying. Dissertation for the Doctoral Degree. Beijing: Beijing University of Posts and Telecommunications, 2014
|
[19] |
Huang W L. Research on key techniques of spam short message filtering. Dissertation for the Doctoral Degree. Hangzhou: Zhejiang University, 2008
|
[20] |
Li Y T. Research on spam short message text classification algorithm. Heilongjiang Science and Technology Information, 2015, 19: 144
|
[21] |
Gong C C. Research on short text language computing. Dissertation for the Doctoral Degree. Beijing: The Institute of Computing Technology of the Chinese Academy of Sciences, 2008
|
[22] |
Ma X, Xu W R, Guo J, Hu R L. SMS-2008: an annotated Chinese short messages corpus. Journal of Chinese Information, 2009, 23(4): 22–26
|
[23] |
He X. Design and implementation of junk short message filtering system. Dissertation for the Doctoral Degree. Chengdu: University of Electronic Science and Technology of China, 2009
|
[24] |
Li H, Zhang Y, Lu H. Junk SMS filtering based on context. Computer Engineering, 2008, 34(12): 154–156
|
/
〈 | 〉 |