Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning
Hang He, Chao Ma, Shan Ye, Wenqiang Tang, Yuxuan Zhou, Zhen Yu, Jiaxin Yi, Li Hou, Mingcai Hou
Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning
Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information. With the rapid development of science and technology, a large number of textual reports have accumulated in the field of geology. However, many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining, making it more challenging for some researchers to extract necessary information from these texts. Natural Language Processing (NLP) has obvious advantages in processing large amounts of textual data. The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques. We propose the Ro-BERTa-Prompt-Tuning-NER method, which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations. The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors. Finally, we conducted experiments on the constructed Geological Named Entity Recognition (GNER) dataset. Our experimental results show that the proposed model achieves the highest F1 score of 80.64% among the four baseline algorithms, demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts.
Prompt Learning / Named Entity Recognition (NER) / low resource geological text / text information mining / big data / geology
[] |
Allahyari, M., Pouriyeh, S., Assefi, M., et al., 2017. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. arXiv: 1707.02919. http://arxiv.org/abs/1707.02919
|
[] |
Bowring, J. F., McLean, N. M., Walker, J. D., et al., 2015. Advanced Cyberinfrastructure for Geochronology as a Collaborative Endeavor: A Decade of Progress, A Decade of Plans. American Geophysical Union, Fall Meeting 2015. IN23E-03
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
Devlin, J., Chang, M. W., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv: 1810.04805. http://arxiv.org/abs/1810.04805
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
Huang, G. H., Zhong, J., Wang, C., et al., 2022. Prompt-Based Self-Training Framework for Few-Shot Named Entity Recognition. Knowledge Science, Engineering and Management. Proceedings of 15th International Conference, KSEM 2022. August 6–8, 2022, Singapore. 91–103. https://doi.org/10.1007/978-3-031-10989-8_8
|
[] |
|
[] |
|
[] |
Li, D. F., Hu, B. T., Chen, Q. C., 2022. Prompt-Based Text Entailment for Low-Resource Named Entity Recognition. arXiv: 2211.03039. http://arxiv.org/abs/2211.03039
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
Quinn, D., Linzmeier, B., Sundell, K., et al., 2021. Implementing the Sparrow Laboratory Data System in Multiple Subdomains of Geochronology and Geochemistry. EGU General Assembly Conference Abstracts. EGU21-13832. https://doi.org/10.5194/egusphere-egu21-13832
|
[] |
|
[] |
|
[] |
Shin, T., Razeghi, Y., Logan IV, R. L., et al., 2020. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. arXiv: 2010.15980. http://arxiv.org/abs/2010.15980
|
[] |
|
[] |
|
[] |
Vieira, D. A., Mookerjee, M., Matsa, S., 2014. Incorporating Geoscience, Field Data Collection Workflows into Software Developed for Mobile Devices. AGU Fall Meeting Abstracts. IN41A-3641
|
[] |
|
[] |
Walker, J., Lehnert, K., Hofmann, A., et al., 2005. EarthChem: International Collaboration for Solid Earth Geochemistry in Geoinformatics. AGU Fall Meeting Abstracts. IN44A-03
|
[] |
|
[] |
|
[] |
|
[] |
|
[] |
Yao, Y., Zhang, A., Zhang, Z. Y., et al., 2021. CPT: Colorful Prompt Tuning for Pre-Trained Vision-Language Models. arXiv: 2109.11797. http://arxiv.org/abs/2109.11797
|
[] |
|
[] |
|
[] |
|
[] |
|
/
〈 | 〉 |