Research on Chinese negation and speculation: corpus annotation and identification
Bowei ZOU, Guodong ZHOU, Qiaoming ZHU
Research on Chinese negation and speculation: corpus annotation and identification
Identifying negative or speculative narrative fragments from facts is crucial for deep understanding on natural language processing (NLP). In this paper, we firstly construct a Chinese corpus which consists of three sub-corpora from different resources. We also present a general framework for Chinese negation and speculation identification. In our method, first, we propose a feature-based sequence labeling model to detect the negative or speculative cues. In addition, a cross-lingual cue expansion strategy is proposed to increase the coverage in cue detection. On this basis, this paper presents a new syntactic structure-based framework to identify the linguistic scope of a negative or speculative cue, instead of the traditional chunking-based framework. Experimental results justify the usefulness of our Chinese corpus and the appropriateness of our syntactic structure-based framework which has showed significant improvement over the state-of-the-art on Chinese negation and speculation identification.
negation / speculation / cue detection / scope resolution / Chinese corpus
[1] |
Morante R, Sporleder C. Modality and negation: an introduction to the special issue. Computational Linguistics, 2012, 38(2): 223–260
CrossRef
Google scholar
|
[2] |
Friedman C, Alderson P O, Austin J H, Cimino J J, Johnson S B. A general natural–language text processor for clinical radiology. American Medical Informatics Association, 1994, 1(2): 161–174
CrossRef
Google scholar
|
[3] |
Di Marco C,Kroon F W, Mercer R E. Using hedges to classify citations in scientific articles. The Information Retrieval Series, 2006, 20: 247–263
CrossRef
Google scholar
|
[4] |
Morante R, Liekens A, Daelemans W. Learning the scope of negation in biomedical texts. In: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 2008, 715–724
CrossRef
Google scholar
|
[5] |
Chowdhury M F M, Lavelli A. Exploiting the scope of negations and heterogeneous features for relation extraction: a case study for drugdrug interaction extraction. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013, 765–771
|
[6] |
Averbuch M, Karson T, Ben-Ami B, Maimon O, Rokach L. Contextsensitive medical information retrieval. In: Proceedings of the 11th World Congress on Medical Informatics. 2004, 1–8
|
[7] |
Wilson T A. Fine-grained subjectivity and sentiment analysis: recognizing the intensity, polarity, and attitudes of private states. ProQuest, 2008
|
[8] |
Councill I G, McDonald R, Velikovich L. What’s great and what’s not: learning to classify the scope of negation for improved sentiment analysis. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing. 2010, 51–59
|
[9] |
Zhu X, Guo H, Mohammad S, Kiritchenko S. An empirical study on the effect of negation words on sentiment. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014, 304–313
CrossRef
Google scholar
|
[10] |
Snow R, Vanderwende L, Menezes A. Effectively using syntax for recognizing false entailment. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 2006, 33–40
CrossRef
Google scholar
|
[11] |
Baker K, Bloodgood M, Dorr B J, Filardo N W, Levin L, Piatko C. A modality lexicon and its use in automatic tagging. In: Proceedings of the 7th Conference on International Language Resources and Evaluation. 2010, 1402–1407
|
[12] |
Wetzel D, Bond F. Enriching parallel corpora for statistical machine translation with semantic negation rephrasing. In: Proceedings of the 6th Workshop on Syntax, Semantics and Structure in Statistical Translation. 2012, 20–29
|
[13] |
Özgür A, Radev D R. Detecting speculations and their scopes in scientific text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2009, 1398–1407
CrossRef
Google scholar
|
[14] |
Øvrelid L, Velldal E, Oepen S. Syntactic scope resolution in uncertainty analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics. 2010, 1379–1387
|
[15] |
Apostolova E, Tomuro N, Demner-Fushman D. Automatic extraction of lexico-syntactic patterns for detection of negation and speculation scopes. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Short Papers. 2011, 283–287
|
[16] |
Morante R, Daelemans W. A metalearning approach to processing the scope of negation. In: Proceedings of the 13th Conference on Computational Natural Language Learning. 2009, 28–36
CrossRef
Google scholar
|
[17] |
Agarwal S, Yu H. Detecting hedge cues and their scope in biomedical text with conditional random fields. Biomedical Informatics, 2010, 43(6): 953–961
CrossRef
Google scholar
|
[18] |
Sánchez L M, Li B, Vogel C. Exploiting CCG structures with tree kernels for speculation detection. In: Proceedings of the 14th Conference on Computational Natural Language Learning: Shared Task. 2007, 126–131
|
[19] |
Zou B, Zhou G, Zhu Q. Tree kernel-based negation and speculation scope detection with structured syntactic parse features. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013, 968–976
|
[20] |
Vincze V, Szarvas G, Farkas R, Móra G, Csirik J. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 2008, 9(11): 279–282
CrossRef
Google scholar
|
[21] |
Ji F, Qiu X, Huang X. Exploring uncertainty sentences in Chinese. In: Proceedings of the 16th China Conference on Information Retrieval. 2010, 594–601
|
[22] |
Chen Z, Zou B, Zhu Q, Li P. Chinese negation and speculation detection with conditional random fields. Communications in Computer and Information Science, 2013, 400: 30–40
CrossRef
Google scholar
|
[23] |
Qian Z, Zou B, Li P, Zhu Q. The prediction method of rise or fall in stock markets based on the discrimination of information credibility. In: Proceedings of the 20th China Conference on Information Retrieval. 2014
|
[24] |
Pang B, Lee L. A sentimental education: sentiment analysis using subjectivity. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 2004, 271–278
CrossRef
Google scholar
|
[25] |
Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, 20: 37–46
CrossRef
Google scholar
|
[26] |
Liu L, Hong Y, Liu H, Wang X, Yao J. Effective selection of translation model training data. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Short Papers. 2014, 569–573
CrossRef
Google scholar
|
[27] |
Och F J, Ney H. A systematic comparison of various statistical alignment models. Computational Linguistics, 2003, 29(1), 19–51
CrossRef
Google scholar
|
[28] |
Jiang Z, Ng T. Semantic role labeling of NomBank: A maximum entropy approach. In: Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 2006, 138–145
CrossRef
Google scholar
|
[29] |
Zhu Q, Li J, Wang H, Zhou G. A unified framework for scope learning via simplified shallow semantic parsing. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2010, 714–724
|
[30] |
Farkas R, Vincze V, Móra G, Csirik J, Szarvas G. The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. Proceedings of the 14th Conference on Computational Natural Language Learning. 2010
|
[31] |
Che W, Li Z, Liu T. LTP: a Chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. 2010, 13–16
|
/
〈 | 〉 |