Please wait a minute...

Frontiers of Computer Science

Front. Comput. Sci.    2017, Vol. 11 Issue (5) : 852-862     DOI: 10.1007/s11704-016-5144-z
RESEARCH ARTICLE |
Ranking and tagging bursty features in text streams with context language models
Wayne Xin ZHAO1,2(), Chen LIU3, Ji-Rong WEN1,2, Xiaoming LI4
1. School of Information, Renmin University of China, Beijing 100872, China
2. Beijing Key Laboratory of Big Data Management and Analysis Methods, Renmin University of China, Beijing 100872, China
3. Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing 100144, China
4. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
Download: PDF(339 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Detecting and using bursty patterns to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context.We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.

Keywords bursty features      bursty features ranking      bursty feature tagging      context modeling     
Just Accepted Date: 10 December 2015   Online First Date: 18 July 2016    Issue Date: 26 September 2017
 Cite this article:   
Wayne Xin ZHAO,Chen LIU,Ji-Rong WEN, et al. Ranking and tagging bursty features in text streams with context language models[J]. Front. Comput. Sci., 2017, 11(5): 852-862.
 URL:  
http://journal.hep.com.cn/fcs/EN/10.1007/s11704-016-5144-z
http://journal.hep.com.cn/fcs/EN/Y2017/V11/I5/852
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Wayne Xin ZHAO
Chen LIU
Ji-Rong WEN
Xiaoming LI
1 KleinbergJ. Bursty and hierarchical structure in streams. Data Mining Knowledge Discovery, 2003, 7(4): 373–397
doi: 10.1023/A:1024940629314
2 VlachosM, MeekC, VagenaZ, Gunopulos D. Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004, 131–142
doi: 10.1145/1007568.1007586
3 FungG P C, YuJ X, YuP S, Lu H. Parameter free bursty events detection in text streams. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 181–192
4 HeQ, ChangK Y, LimE P. Analyzing feature trajectories for event detection. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 207–214
doi: 10.1145/1277741.1277779
5 HeQ, ChangK Y, LimE P, Zhang J. Bursty feature representation for clustering text streams. In: Proceedings of the 2007 SIAM Conference on Data Mining. 2007, 491–496
doi: 10.1137/1.9781611972771.50
6 LappasT, AraiB, PlatakisM, Kotsakos D, GunopulosD . On burstiness-aware search for document sequences. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 477–486
doi: 10.1145/1557019.1557075
7 FungG P C, YuX J, LiuH, Yu P S. Time-dependent event hierarchy construction. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2007, 300–309
doi: 10.1145/1281192.1281227
8 ParikhN, Sundaresan N. Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 972–980
doi: 10.1145/1401890.1402006
9 KumarR, NovakJ, RaghavanP, Tomkins A. On the bursty evolution of blogspace. In: Proceedings of the 12th International Conference on World Wide Web. 2003, 568–576
doi: 10.1145/775152.775233
10 WangX H, ZhaiC X, HuX, SproatR. Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 784–793
doi: 10.1145/1281192.1281276
11 JiangY L, LinC X, MeiQ Z. Context comparison of bursty events in web search and online media. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010, 1077–1087
12 YaoJ J, CuiB, HuangY X, Jin X. Temporal and social context based burst detection from folksonomies. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 1474–1479
13 MeiQ Z, XinD, ChengH, Han J W, ZhaiC X . Generating semantic annotations for frequent patterns with context analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 337–346
doi: 10.1145/1150402.1150441
14 MeiQ Z, ShenX H, ZhaiC X. Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 490–499
doi: 10.1145/1281192.1281246
15 ZhaiC X. Statistical language models for information retrieval: a critical review. Foundations and Trends in Information Retrieval, 2008
doi: 10.2200/s00158ed1v01y200811hlt001
16 BleiD M, NgA Y, JordanM I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993–1022
17 ZhaiC, Lafferty J. Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management. 2001, 403–410
doi: 10.1145/502585.502654
[1] FCS-0852-15144-WXZ_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed