Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning

Hang He, Chao Ma, Shan Ye, Wenqiang Tang, Yuxuan Zhou, Zhen Yu, Jiaxin Yi, Li Hou, Mingcai Hou

Journal of Earth Science ›› 2024, Vol. 35 ›› Issue (3) : 1035-1043. DOI: 10.1007/s12583-023-1944-8
Geoscience Big Data

Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning

Author information +
History +

Abstract

Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information. With the rapid development of science and technology, a large number of textual reports have accumulated in the field of geology. However, many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining, making it more challenging for some researchers to extract necessary information from these texts. Natural Language Processing (NLP) has obvious advantages in processing large amounts of textual data. The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques. We propose the Ro-BERTa-Prompt-Tuning-NER method, which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations. The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors. Finally, we conducted experiments on the constructed Geological Named Entity Recognition (GNER) dataset. Our experimental results show that the proposed model achieves the highest F1 score of 80.64% among the four baseline algorithms, demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts.

Keywords

Prompt Learning / Named Entity Recognition (NER) / low resource geological text / text information mining / big data / geology

Cite this article

Download citation ▾
Hang He, Chao Ma, Shan Ye, Wenqiang Tang, Yuxuan Zhou, Zhen Yu, Jiaxin Yi, Li Hou, Mingcai Hou. Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning. Journal of Earth Science, 2024, 35(3): 1035‒1043 https://doi.org/10.1007/s12583-023-1944-8

References

[]
Allahyari, M., Pouriyeh, S., Assefi, M., et al., 2017. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. arXiv: 1707.02919. http://arxiv.org/abs/1707.02919
[]
Bowring, J. F., McLean, N. M., Walker, J. D., et al., 2015. Advanced Cyberinfrastructure for Geochronology as a Collaborative Endeavor: A Decade of Progress, A Decade of Plans. American Geophysical Union, Fall Meeting 2015. IN23E-03
[]
Chan M A, Peters S E, Tikoff B. The Future of Field Geology, Open Data Sharing and CyberTechnology in Earth Science. The Sedimentary Record, 2016, 14(1): 4-10,
CrossRef Google scholar
[]
Chu D P, Wan B, Li H, et al.. Geological Entity Recognition Based on ELMO-CNN-BiLSTM-CRF Model. Earth Science, 2021, 46(8): 3039-3048 (in Chinese with English Abstract)
[]
Consoli B, Santos J, Gomes D, et al.. . Embeddings for Named Entity Recognition in Geoscience Portuguese Literature, 2020 Marseille, France European Language Resources Association 4625-4630
[]
Cutcher-Gershenfeld J, Baker K S, Berente N, et al.. Build It, but will They Come? A Geoscience Cyberinfrastructure Baseline Analysis. Data Science Journal, 2016, 15: 8,
CrossRef Google scholar
[]
Devlin, J., Chang, M. W., Lee, K., et al., 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv: 1810.04805. http://arxiv.org/abs/1810.04805
[]
Enkhsaikhan M, Holden E J, Duuring P, et al.. Understanding Ore-Forming Conditions Using Machine Reading of Text. Ore Geology Reviews, 2021, 135: 104200,
CrossRef Google scholar
[]
Fan R Y, Wang L Z, Yan J N, et al.. Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards. ISPRS International Journal of Geo-Information, 2019, 9(1): 15,
CrossRef Google scholar
[]
Guo C, Xu Q, Dong X J, et al.. Geohazard Recognition and Inventory Mapping Using Airborne LiDAR Data in Complex Mountainous Areas. Journal of Earth Science, 2021, 32(5): 1079-1091,
CrossRef Google scholar
[]
He Y X, Luo C W, Hu B Y. Geographic Entity Recognition Method Based on Crf Model and Rules Combination. Computer Applications and Software, 2015, 32(1): 179-185 (in Chinese with English Abstract)
[]
Holden E J, Liu W, Horrocks T, et al.. GeoDocA—Fast Analysis of Geological Content in Mineral Exploration Reports: A Text Mining Approach. Ore Geology Reviews, 2019, 111: 102919,
CrossRef Google scholar
[]
Huang, G. H., Zhong, J., Wang, C., et al., 2022. Prompt-Based Self-Training Framework for Few-Shot Named Entity Recognition. Knowledge Science, Engineering and Management. Proceedings of 15th International Conference, KSEM 2022. August 6–8, 2022, Singapore. 91–103. https://doi.org/10.1007/978-3-031-10989-8_8
[]
Kitchin R. Big Data, New Epistemologies and Paradigm Shifts. Big Data & Society, 2014, 1(1): 205395171452848,
CrossRef Google scholar
[]
Lehnert K, Su Y, Langmuir C H, et al.. A Global Geochemical Database Structure for Rocks. Geochemistry, Geophysics, Geosystems, 2000, 1(1): 1012
[]
Li, D. F., Hu, B. T., Chen, Q. C., 2022. Prompt-Based Text Entailment for Low-Resource Named Entity Recognition. arXiv: 2211.03039. http://arxiv.org/abs/2211.03039
[]
Liu P F, Yuan W Z, Fu J L, et al.. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 2023, 55(9): 195,
CrossRef Google scholar
[]
X, Xie Z, Xu D X, et al.. Chinese Named Entity Recognition in the Geoscience Domain Based on BERT. Earth and Space Science, 2022, 9(3): e02166
[]
Ma K, Tian M, Tan Y J, et al.. Named Entity Recognition Dataset for Four Regional Geological Survey Reports by Data Mining Methodology. Journal of Global Change Data & Discovery, 2022, 6(1): 78-84
[]
McKay N P, Emile-Geay J. Technical Note: The Linked Paleo Data Framework—A Common Tongue for Paleoclimatology. Climate of the Past, 2016, 12(4): 1093-1100,
CrossRef Google scholar
[]
Peters S E, Husson J M. We need a Global Comprehensive Stratigraphic Database: Here’s a Start. The Sedimentary Record, 2018, 16(1): 4-9,
CrossRef Google scholar
[]
Peters S E, Husson J M, Czaplewski J. Macrostrat: A Platform for Geological Data Integration and Deep-Time Earth Crust Research. Geochemistry, Geophysics, Geosystems, 2018, 19(4): 1393-1409,
CrossRef Google scholar
[]
Peters S E, McClennen M. The Paleobiology Database Application Programming Interface. Paleobiology, 2016, 42(1): 1-7,
CrossRef Google scholar
[]
Piskorski J, Yangarber R. Information Extraction: Past, Present and Future. Multi-source, Multilingual Information Extraction and Summarization, 2013 Berlin, Heidelberg Springer 23-49,
CrossRef Google scholar
[]
Qiu Q J, Xie Z, Wu L, et al.. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth and Space Science, 2019, 6(6): 931-946,
CrossRef Google scholar
[]
Qiu Q J, Tian M, Xie Z, et al.. Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach. Journal of Earth Science, 2023, 34(5): 1406-1417,
CrossRef Google scholar
[]
Quinn, D., Linzmeier, B., Sundell, K., et al., 2021. Implementing the Sparrow Laboratory Data System in Multiple Subdomains of Geochronology and Geochemistry. EGU General Assembly Conference Abstracts. EGU21-13832. https://doi.org/10.5194/egusphere-egu21-13832
[]
Raja N B, Dunne E M, Matiwane A, et al.. Colonial History and Global Economics Distort our Understanding of Deep-Time Biodiversity. Nature Ecology & Evolution, 2022, 6(2): 145-154,
CrossRef Google scholar
[]
Sang E F, De Meulder F. . Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition, 2003 Morristown, NJ, USA Association for Computational Linguistics
[]
Shin, T., Razeghi, Y., Logan IV, R. L., et al., 2020. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. arXiv: 2010.15980. http://arxiv.org/abs/2010.15980
[]
Shipley T F, Tikoff B. Collaboration, Cyberinfrastructure, and Cognitive Science: The Role of Databases and Dataguides in 21st Century Structural Geology. Journal of Structural Geology, 2019, 125: 48-54,
CrossRef Google scholar
[]
Singer D A. How Deep Learning Networks could be Designed to Locate Mineral Deposits. Journal of Earth Science, 2021, 32(2): 288-292,
CrossRef Google scholar
[]
Vieira, D. A., Mookerjee, M., Matsa, S., 2014. Incorporating Geoscience, Field Data Collection Workflows into Software Developed for Mobile Devices. AGU Fall Meeting Abstracts. IN41A-3641
[]
Walker J D, Tikoff B, Newman J, et al.. StraboSpot Data System for Structural Geology. Geosphere, 2019, 15(2): 533-547,
CrossRef Google scholar
[]
Walker, J., Lehnert, K., Hofmann, A., et al., 2005. EarthChem: International Collaboration for Solid Earth Geochemistry in Geoinformatics. AGU Fall Meeting Abstracts. IN44A-03
[]
Wang B, Ma K, Wu L, et al.. Visual Analytics and Information Extraction of Geological Content for Text-Based Mineral Exploration Reports. Ore Geology Reviews, 2022, 144: 104818,
CrossRef Google scholar
[]
Wang Q Y, Li Z H, Tu Z P, et al.. Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model. Earth Science, 2023, 48(8): 3137-3150 (in Chinese with English Abstract)
[]
Williams J W, Grimm E C, Blois J L, et al.. The Neotoma Paleoecology Database, a Multiproxy, International, Community-Curated Data Resource. Quaternary Research, 2018, 89(1): 156-177,
CrossRef Google scholar
[]
Yan H, Yang N, Peng Y, et al.. Data Mining in the Construction Industry: Present Status, Opportunities, and Future Trends. Automation in Construction, 2020, 119: 103331,
CrossRef Google scholar
[]
Yao, Y., Zhang, A., Zhang, Z. Y., et al., 2021. CPT: Colorful Prompt Tuning for Pre-Trained Vision-Language Models. arXiv: 2109.11797. http://arxiv.org/abs/2109.11797
[]
Ye S. . A Quantitative Investigation of Large Geoscientific Datasets: How Records of Geochronology and Macroevolution are Distorted by Paleoclimate, Paleoenvironment, and Sediment Preservation: [Dissertation], 2022 Madison The University of Wisconsin-Madison
[]
Ye S, Cuzzone J K, Marcott S A, et al.. A Quantitative Assessment of Snow Shielding Effects on Surface Exposure Dating from a Western North American 10Be Data Compilation. Quaternary Geochronology, 2023, 76: 101440,
CrossRef Google scholar
[]
Ye S, Peters S E. Bedrock Geological Map Predictions for Phanerozoic Fossil Occurrences. Paleobiology, 2023, 49(3): 394-413,
CrossRef Google scholar
[]
Zhu Y Q, Sun K, Hu X M, et al.. Research and Practice on the Framework for the Construction, Sharing, and Application of Large-Scale Geoscience Knowledge Graphs. Journal of Geo-information Science, 2023, 25(6): 1215-1227 (in Chinese with English Abstract)

Accesses

Citations

Detail

Sections
Recommended

/