BUEES: a bottom-up event extraction system
Xiao DING, Bing QIN, Ting LIU
BUEES: a bottom-up event extraction system
Traditional event extraction systems focus mainly on event type identification and event participant extraction based on pre-specified event type paradigms and manually annotated corpora. However, different domains have different event type paradigms. When transferring to a new domain, we have to build a new event type paradigm and annotate a new corpus from scratch. This kind of conventional event extraction system requires massive human effort, and hence prevents event extraction from being widely applicable. In this paper, we present BUEES, a bottom-up event extraction system, which extracts events from the web in a completely unsupervised way. The system automatically builds an event type paradigm in the input corpus, and then proceeds to extract a large number of instance patterns of these events. Subsequently, the system extracts event arguments according to these patterns. By conducting a series of experiments, we demonstrate the good performance of BUEES and compare it to a state-of-the-art Chinese event extraction system, i.e., a supervised event extraction system. Experimental results show that BUEES performs comparably (5% higher F-measure in event type identification and 3% higher F-measure in event argument extraction), but without any human effort.
Event extraction / Unsupervised learning / Bottom-up
[1] |
Ahn, D., 2006. The stages of event extraction. Proc. Workshop on Annotating and Reasoning about Time and Events, p.1-8.
|
[2] |
Banko, M., Etzioni, O., 2008. The tradeoffs between open and traditional relation extraction. Proc. Annual Meeting on Association for Computational Linguistics, p.28-36.
|
[3] |
Banko, M., Cafarella, M.J., Soderland, S.,
|
[4] |
Barzilay, R., McKeown, K.R., 2001. Extracting paraphrases from a parallel corpus. Proc. 39th Annual Meeting on Association for Computational Linguistics, p.50-57. [
CrossRef
Google scholar
|
[5] |
Chambers, N., Jurafsky, D., 2009. Unsupervised learning of narrative schemas and their participants. Proc. 47th Annual Meeting on Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing, p. 602-610.
|
[6] |
Chambers, N., Jurafsky, D., 2011. Template-based information extraction without the templates. Proc. 49th Annual Meeting on Association for Computational Linguistics, p.976-986.
|
[7] |
Che, W., Li, Z., Li, Y.,
|
[8] |
Chen, Z., Ji, H., 2009. Language specific issue and feature exploration in Chinese event extraction. Proc. Annual Conf. on Association for Computational Linguistics, p.209-212.
|
[9] |
Chinchor, N., Lewis, D.D., Hirschman, L., 1993. Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3). Comput. Ling., 19(3): 409-449.
|
[10] |
Ding, X., Song, F., Qin, B.,
|
[11] |
Ding, X., Qin, B., Liu, T., 2013. Building Chinese event type paradigm based on trigger clustering. Proc. Int. Joint Conf. on Natural Language Processing, p.311-319.
|
[12] |
Dong, Z., Dong, Q., 2006. HowNet and the Computation of Meaning. World Scientific Publishing Company, USA.
|
[13] |
Etzioni, O., Fader, A., Christensen, J.,
|
[14] |
Fader, A., Soderland, S., Etzioni, O., 2011. Identifying relations for open information extraction. Proc. Conf. on Empirical Methods in Natural Language Processing, p.1535-1545.
|
[15] |
Friedman, J.H., Bentley, J.L., Finkel, R.A., 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw., 3(3): 209-226. [
CrossRef
Google scholar
|
[16] |
Grishman, R., 1997. Information extraction: techniques and challenges. In: Pazienza, M.T. (Ed.), Information Extraction: a Multidisciplinary Approach to an Emerging Information Technology. Springer Berlin Heidelberg, New York, USA, p.10-27. [
CrossRef
Google scholar
|
[17] |
Grishman, R., 2001. Adaptive information extraction and sublanguage analysis. Int. Joint Conf. on Artificial Itelligence, Workshop on Adaptive Text Extraction and Mining.
|
[18] |
Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2001. On clustering validation techniques. J. Intell. Inform. Syst., 17(2-3): 107-145. [
CrossRef
Google scholar
|
[19] |
Hasegawa, T., Sekine, S., Grishman, R., 2004. Discovering relations among named entities from large corpora. Proc. 42nd Annual Meeting on Association for Computational Linguistics, Article 415. [
CrossRef
Google scholar
|
[20] |
Hirschberg, D.S., 1977. Algorithms for the longest common subsequence problem. J. ACM, 24(4): 664-675. [
CrossRef
Google scholar
|
[21] |
Hong, Y., Zhang, J., Ma, B.,
|
[22] |
Ibrahim, A., Katz, B., Lin, J., 2003. Extracting structural paraphrases from aligned monolingual corpora. Proc. 2nd Int. Workshop on Paraphrasing, p.57-64. [
CrossRef
Google scholar
|
[23] |
Ji, H., Grishman, R., 2008. Refining event extraction through cross-document inference. Proc. Association for Computational Linguistics, p.254-262.
|
[24] |
Lee, C.S., Chen, Y.J., Jian, Z.W., 2003. Ontology-based fuzzy event extraction agent for Chinese e-news summarization. Expert Syst. Appl., 25(3): 431-447. [
CrossRef
Google scholar
|
[25] |
Liao, S., Grishman, R., 2010. Filtered ranking for bootstrapping in event extraction. Proc. 23rd Int. Conf. on Computational Linguistics, p.680-688.
|
[26] |
Lin, D., Pantel, P., 2001. DIRT@SBT@discovery of inference rules from text. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.323-328. [
CrossRef
Google scholar
|
[27] |
Liu, T., Ma, J., Zhang, H.,
CrossRef
Google scholar
|
[28] |
Mei, J.J., Zhu, Y.M., Gao, Y.Q.,
|
[29] |
Miller, S., Guinness, J., Zamanian, A., 2004. Name tagging with word clusters and discriminative training. Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.337-342.
|
[30] |
Miwa, M., Sære, R., Kim, J.D.,
CrossRef
Google scholar
|
[31] |
Pang, B., Knight, K., Marcu, D., 2003. Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences. Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.102-109. [
CrossRef
Google scholar
|
[32] |
Patwardhan, S., Riloff, E., 2006. Learning domain-specific information extraction patterns from the Web. Proc. Workshop on Information Extraction Beyond the Document, p.66-73.
|
[33] |
Pham, X., Le, M., Ho, B., 2013. A hybrid approach for biomedical event extraction. Proc. Association for Computational Linguistics, p.121-124.
|
[34] |
Poon, H., Domingos, P., 2008. Joint unsupervised coreference resolution with Markov logic. Proc. Conf. on Empirical Methods in Natural Language Processing, p.650-659.
|
[35] |
Poon, H., Domingos, P., 2009. Unsupervised semantic parsing. Proc. Conf. on Empirical Methods in Natural Language Processing, p.1-10.
|
[36] |
Riloff, E., 1996. Automatically generating extraction patterns from untagged text. Proc. AAAI, p.1044-1049.
|
[37] |
Ritter, A., Mausam, Etzioni, O.,
CrossRef
Google scholar
|
[38] |
Rosenfeld, B., Feldman, R., 2006. URES: an unsupervised web relation extraction system. Proc. COLING/ACL on Main Conference Poster Sessions, p.667-674.
|
[39] |
Schilder, F., 2007. Event extraction and temporal reasoning in legal documents. In: Schilder, F., Katz, G., Pustejovsky, J. (Eds.), Annotating, Extracting and Reasoning about Time and Events, p.55-71. [
CrossRef
Google scholar
|
[40] |
Shinyama, Y., Sekine, S., 2006. Preemptive information extraction using unrestricted relation discovery. Proc. Conf. of the North American Chapter of the Association of Computational Linguistics on Human Language Technology, p.304-311. [
CrossRef
Google scholar
|
[41] |
Soderland, S., 1999. Learning information extraction rules for semi-structured and free text. Mach. Learn., 34(1-3): 233-272. [
CrossRef
Google scholar
|
[42] |
Stevenson, M., Greenwood, M.A., 2005. A semantic approach to IE pattern induction. Proc. 43rd Annual Meeting on Association for Computational Linguistics, p.379-386. [
CrossRef
Google scholar
|
[43] |
Sudo, K., Sekine, S., Grishman, R., 2003. An improved extraction pattern representation model for automatic IE pattern acquisition. Proc. 41st Annual Meeting on Association for Computational Linguistics, p.224-231. [
CrossRef
Google scholar
|
[44] |
Wagner, W., Schmid, H., im Walde, S.S., 2009. Verb sense disambiguation using a predicate-argument-clustering model. Proc. CogSci Workshop on Distributional Semantics Beyond Concrete Concepts, p.23-28.
|
[45] |
Wu, F., Weld, D.S., 2010. Open information extraction using Wikipedia. Proc. 48th Annual Meeting of the Association for Computational Linguistics, p.118-127.
|
[46] |
Yangarber, R., Grishman, R., Tapanainen, P.,
CrossRef
Google scholar
|
[47] |
Yates, A., Etzioni, O., 2009. Unsupervised methods for determining object and relation synonyms on the web. J. Artif. Intell. Res., 34(1): 255-296.
|
[48] |
Yeh, A., Hirschman, L., Morgan, A., 2002. Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles. ACM SIGKDD Explor. Newslett., 4(2): 87-89. [
CrossRef
Google scholar
|
/
〈 | 〉 |