Frontiers of Electrical and Electronic Engineering >
Can prior knowledge help graph-based methods for keyword extraction?
Received date: 28 Apr 2011
Accepted date: 30 Aug 2011
Published date: 05 Jun 2012
Copyright
Graph-based methods are one of the widely used unsupervised approaches for keyword extraction. In this approach, words are linked according to their co-occurrences within the document. Afterwards, graph-based ranking algorithms are used to rank words and those with the highest scores are selected as keywords. Although graph-based methods are effective for keyword extraction, they rank words merely based on word graph topology. In fact, we have various prior knowledge to identify how likely the words are keywords. The knowledge of words may be frequency-based, position-based, or semantic-based. In this paper, we propose to incorporate prior knowledge with graph-based methods for keyword extraction and investigate the contributions of the prior knowledge. Experiments reveal that prior knowledge can significantly improve the performance of graph-based keyword extraction. Moreover, by combining prior knowledge with neighborhood knowledge, in experiments we achieve the best results compared to previous graph-based methods.
Key words: keyword extraction; prior knowledge; PageRank; DiffusionRank
Zhiyuan LIU , Maosong SUN . Can prior knowledge help graph-based methods for keyword extraction?[J]. Frontiers of Electrical and Electronic Engineering, 2012 , 7(2) : 242 -253 . DOI: 10.1007/s11460-011-0174-7
1 |
Turney P D. Learning to extract keyphrases from text. Technical Report ERB-1057. Ottawa: National Research Council Canada, 1999
|
2 |
Liu Z, Li P, Zheng Y, Sun M. Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 257-266
|
3 |
Liu Z, Sun M. Domain-specific term rankings using topic models. In: Proceedings of the 6th Asia Information Retrieval Societies Conference. Lecture notes in Computer Science, 2010, 6458: 454-465
|
4 |
Liu Z, Shi C, Sun M. FolkDiffusion: A graph-based tag suggestion method for folksonomies. In: Proceedings of the 6th Asia Information Retrieval Societies Conference. Lecture notes in Computer Science, 2010, 6458: 231-240
|
5 |
Liu Z, Huang W, Zheng Y, Sun M. Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010, 366-376
|
6 |
Liu Z, Chen X, Zheng Y, Sun M. Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the Fifth Conference on Computational Natural Language Learning. 2011, 135-144
|
7 |
Mihalcea R, Tarau P. TextRank: Bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2004, 404-411
|
8 |
Wan X, Xiao J. Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intelligence. 2008, 855-860
|
9 |
Wan X, Xiao J. CollabRank: Towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics. 2008, 969-976
|
10 |
Litvak M, Last M. Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop Multi-Source Multilingual Information Extraction and Summarization. 2008, 17-24
|
11 |
Huang C, Tian Y, Zhou Z, Ling C X, Huang T. Keyphrase extraction using semantic networks structure analysis. In: Proceedings of the Sixth IEEE International Conference on Data Mining. 2006, 275-284
|
12 |
Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford Digital Library Technologies Project, 1998, 1-17
|
13 |
Gyongyi Z, Garcia-Molina H, Pedersen J. Combating web spam with trustrank. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases. 2004, 576-587
|
14 |
Yang H, King I, Lyu M R. DiffusionRank: A possible penicillin for web spamming. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 431-438
|
15 |
Ma H, Yang H, Lyu M R, King I. Mining social networks using heat diffusion processes for marketing candidates selection. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008,233-242
|
16 |
Ma H, Yang H, King I, Lyu M R. Learning latent semantic relations from clickthrough data for query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008, 709-718
|
17 |
Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval. Upper Saddle River: Addison-Wesley, 1999
|
18 |
Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. New York, NY: Cambridge University Press, 2008
|
19 |
Croft B, Metzler D, Strohman T. Search Engines: Information Retrieval in Practice. Upper Saddle River: Addison-Wesley, 2009
|
20 |
Frank E, Paynter G W, Witten I H, Gutwin C, Nevill-Manning C G. Domain-specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence. 1999, 668-673
|
21 |
Medelyan O, Witten I H. Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology, 2008, 59(7): 1026-1040
|
22 |
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022
|
23 |
Landauer T K, Foltz P W, Laham D. An introduction to latent semantic analysis. Discourse Processes, 1998, 25(2-3): 259-284
|
24 |
Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999, 1-8
|
25 |
Minka T, Lafferty J. Expectation-propagation for the generative aspect model. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence. 2002, 352-359
|
26 |
Griffiths T L, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5228-5235
|
27 |
Zhai C. Statistical language models for information retrieval. Synthesis Lectures on Human Language Technologies, 2008, 1(1): 1-141
|
28 |
Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. 2003, 216-223
|
29 |
Over P, Liggett W, Gilbert H, Sakharov A, Thatcher M. Introduction to DUC-2001: An intrinsic evaluation of generic news text summarization systems. In: Proceedings of 2001 Document Understanding Conference. 2001
|
30 |
Turney P D. Learning algorithms for keyphrase extraction. Information Retrieval, 2000, 2(4): 303-336
|
/
〈 | 〉 |