Frontiers of Computer Science

Front. Comput. Sci.    2018, Vol. 12 Issue (1) : 135-145
A multi-level approach to highly efficient recognition of Chinese spam short messages
Weimin WANG1(), Dan ZHOU2
1. School of Computer Science & Engineering, Jiangsu University of Science and Technology, Jiangsu 212003, China
2. School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100190, China
The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. Themethod can learnmany interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.

Keywords spam short message      spam recognition      similarity computing      pattern learning     
Corresponding Authors: Weimin WANG   
Just Accepted Date: 25 February 2016   Online First Date: 07 June 2017    Issue Date: 12 January 2018
Weimin WANG,Dan ZHOU. A multi-level approach to highly efficient recognition of Chinese spam short messages[J]. Front. Comput. Sci., 2018, 12(1): 135-145.
