Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation
Ruifeng XU, Lin GUI, Qin LU, Shuai WANG, Jian XU
Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation
The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in documents as features to distinguish different entities. Due to the lack of use of word order as a feature and the limited use of external knowledge, the traditional approach has performance limitations. This paper presents an approach for named entity disambiguation through entity linking based on a multikernel and Internet verification to improve Chinese person name disambiguation. The proposed approach extends a linear kernel that uses in-document word features by adding a string kernel to construct a multi-kernel function. This multi-kernel can then calculate the similarities between an input document and the entity descriptions in a named person knowledge base to form a ranked list of candidates to different entities. Furthermore, Internet search results based on keywords extracted from the input document and entity descriptions in the knowledge base are used to train classifiers for verification. The evaluations on CIPS-SIGHAN 2012 person name disambiguation bakeoff dataset show that the use of word orders and Internet knowledge through a multi-kernel function can improve both precision and recall and our system has achieved state-of-the-art performance.
Chinese person name disambiguation / Internet verification / string kernel / multi-kernel function / machine learning
[1] |
Chen L W, Feng Y S, Zou L, Zhao D Y. Explore person specific evidence in Web person name disambiguation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012, 832–842
|
[2] |
Zhang B L, Huang H Z, Pan X M, Ji H, Knight K, Wen Z, Sun Y Z, Han J W, Yener B. Be appropriate and funny: automatic entity morph encoding. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014
CrossRef
Google scholar
|
[3] |
Huang H Z, Wen Z, Yu D, Ji H, Sun Y Z, Han J W, Li H. Resolving entity morphs in censored data. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013, 1083–1093
|
[4] |
Wang H F, Mei Z. Chinese multi-document personal name disambiguation. High Techlology Letters, 2005, 11(3): 280–283
|
[5] |
Xu J, Lu Q, Liu Z Z. Aggregating skip bigrams into key phrase-based vector space model for Web person disambiguation. In: Proceedings of KONVENS 2012 (Main track: oral presentations). 2012, 108–117
|
[6] |
Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H. Person name disambiguation by bootstrapping. In: Proceedings of the 33rd international ACMSIGIR Conference on Research and Development in Information Retrieval. 2010, 10–17
CrossRef
Google scholar
|
[7] |
Xu J, Lu Q, Liu Z Z. Combining classification with clustering for web person disambiguation. In: Proceedings of the 21st International Conference Companion on World Wide Web. 2012, 637–638
CrossRef
Google scholar
|
[8] |
Chen C, Hu J F, Wang H F. Clustering technique in multi-document personal name disambiguation. In: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop. 2009, 88–95
CrossRef
Google scholar
|
[9] |
Chen Z, Tamang S, Lee A, Li X, Lin W P, Snover M, Artiles J, Passantino M, Ji H. Cunyblender TAC-KBP 2010 entity linking and slot filling system description. In: Proceedings of the Text Analysis Conference. 2010
|
[10] |
Lehmann J, Monahan S, Nezda L, Jung A, Shi Y. LCC approaches to knowledge base population at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010
|
[11] |
Radford W, Hachey B, Nothman J, Honnibal M, Curran J R. Document-level entity linking: CMCRC at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010
|
[12] |
Varma V, Bysani P, Reddy K, Reddy V B, Kovelamudi S, Vaddepally S R, Nanduri R, N K K, Gsk S, Pingali P. IIIT hyderabad in guided summarization and knowledge base guided summarization track. In: Proceedings of the Text Analysis Conference. 2010
|
[13] |
Agirre E, Chang A X, Jurafsky D S, Manning C D, Spitkovsky V I, Yeh E. Stanford-UBC at TAC-KBP. In: Proceedings of Test Analysis Conference 2009. 2009
|
[14] |
Li S, Gao S Y, Zhang Z Y, Li X S, Guan J Y, Xu W R, Guo J. PRIS at TAC 2009: experiments in KBP track. In: Proceedings of Test Analysis Conference 2009. 2009
|
[15] |
McNamee P. HLTCOE efforts in entity linking at TAC KBP 2010. In: Proceedings of the Text Analysis Conference. 2010
|
[16] |
Zhang W, Su J, Chen B, Wang W, Toh Z, Sim Y, Cao Y, Lin C, Tan C L. I2R-NUS-MSRA at TAC 2011: entity linking. In: Proceedings of the Text Analysis Conference. 2011
|
[17] |
Han X P, Zhao J. Named entity disambiguation by leveraging wikipedia semantic knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 215–224
CrossRef
Google scholar
|
[18] |
Song Y, Huang J, Councill I G, Li J, Giles C L. Efficient topicbased unsupervised name disambiguation. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. 2007, 342–351
|
[19] |
Bekkerman R, McCallum A. Disambiguating web appearances of people in a social network. In: Proceedings of the 14th International Conference on World Wide Web. 2005, 463–470
CrossRef
Google scholar
|
[20] |
Han X P, Zhao J. Web personal name disambiguation based on reference entity tables mined from the Web. In: Proceedings of the 11th International Workshop on Web Information and Data Management. 2009, 75–82
CrossRef
Google scholar
|
[21] |
Tang J T, Lu Q, Wang T, Wang J, Li W J. A bipartite graph based social network splicing method for person name disambiguation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 1233–1234
CrossRef
Google scholar
|
[22] |
Lang J, Qin B, Song W, Liu L, Liu T, Li S. Person name disambiguation of searching results using social network. Chinese Journal of Computers, 2009, 32(7): 1365–1374
CrossRef
Google scholar
|
[23] |
Xu R F, Xu J, Dai X Y, Kit C. Combine person name and person identity recognition and document clustering for Chinese person name disambiguation. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010, 359
|
[24] |
Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7(2): 179–188
CrossRef
Google scholar
|
[25] |
Han Z H, Peng L, Sun X P. SIR-NERD: A Chinese named entity recog nition and disambiguation system using a two-stage method. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 115
|
[26] |
Zong H, Wong D F, Chao L S. A template based hybrid model for Chinese personal name disambiguation. In: Proceedings of the 2nd CIPSSIGHAN Joint Conference on Chinese Language Processing. 2012
|
[27] |
Han W, Liu G, Mao Y Z, Huang Z N. Attribute based Chinese named entity recognition and disambiguation. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 127
|
/
〈 | 〉 |