A multi-level approach to highly efficient recognition of Chinese spam short messages
Weimin WANG , Dan ZHOU
Front. Comput. Sci. ›› 2018, Vol. 12 ›› Issue (1) : 135 -145.
A multi-level approach to highly efficient recognition of Chinese spam short messages
The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. Themethod can learnmany interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.
spam short message / spam recognition / similarity computing / pattern learning
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
Supplementary files
/
| 〈 |
|
〉 |