Enhancing named entity recognition with a novel BERT-BiLSTM-CRF-RC joint training model for biomedical materials database

Mufei Li , Yan Zhuang , Ke Chen , Lin Han , Xiangfeng Li , Yongtao wei , Xiangdong Zhu , Mingli Yang , Guangfu Yin , Jiangli Lin , Xingdong Zhang

Materials Genome Engineering Advances ›› 2025, Vol. 3 ›› Issue (1) : e70001

PDF
Materials Genome Engineering Advances ›› 2025, Vol. 3 ›› Issue (1) : e70001 DOI: 10.1002/mgea.70001
RESEARCH ARTICLE

Enhancing named entity recognition with a novel BERT-BiLSTM-CRF-RC joint training model for biomedical materials database

Author information +
History +
PDF

Abstract

In this study, we propose a novel joint training model for named entity recognition (NER) that combines BERT, BiLSTM, CRF, and a reading comprehension (RC) mechanism. Traditional BERT-BiLSTM-CRF models often struggle with inaccurate boundary detection and excessive fragmentation of named entities due to their lack of specialized vocabulary. Our model addresses these issues by integrating an RC mechanism, which helps refine fragmented results by enabling the model to more precisely identify entity boundaries without relying on an expert-annotated dictionary. Additionally, segmentation issues are further mitigated through a segmented combined voting- and positive-sample-coverage technique. We applied this model to develop a database for mesoporous bioactive glass (MBG). Furthermore, a classifier was developed to automatically detect the presence of pertinent information within paragraphs. For this study, 200 articles were searched using MBG-related keywords, and the data were split into a training set and a test set in a 9:1 ratio. A total of 492 paragraphs were automatically extracted for training, and 50 paragraphs were extracted for testing the model. The results demonstrate that our joint training model achieves an accuracy of 92.8% in named entity recognition, which is 4.3% higher than the 88.5% accuracy of the traditional BERT-BiLSTM-CRF model.

Keywords

automated material database / BERT-BiLSTM-CRF-RC / named entity recognition / reading comprehension

Cite this article

Download citation ▾
Mufei Li, Yan Zhuang, Ke Chen, Lin Han, Xiangfeng Li, Yongtao wei, Xiangdong Zhu, Mingli Yang, Guangfu Yin, Jiangli Lin, Xingdong Zhang. Enhancing named entity recognition with a novel BERT-BiLSTM-CRF-RC joint training model for biomedical materials database. Materials Genome Engineering Advances, 2025, 3(1): e70001 DOI:10.1002/mgea.70001

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Jain A, Ong SP, Hautier G, et al. The materials project: a materials genome approach to accelerating materials innovation. APL Mater. 2013; 1(1):011002.

[2]

Kalidindi SR, De Graef M. Materials data science: current status and future outlook. Annu Rev Mater Res. 2015; 45(1): 171-193.

[3]

Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature. 2018; 559(7715): 547-555.

[4]

Zhang P, Lee C. Exploring BiLSTM-CRF-based models for open relation extraction. AI Soc. 2022; 31(1): 77-89.

[5]

Gao T, Chen J. Bert-flat-crf: a parallel model for Chinese policy named entity recognition. J Comput Linguist. 2023; 28(4): 98-115.

[6]

Wang L, Zhang Y, Lin Q. Named entity recognition of BERT-BiLSTM-CRF combined with self-attention. J Nat Lang Process. 2023; 32(2): 45-60.

[7]

Yu M, Xu D, Liu Y. Enhancing cyber threat intelligence with named entity recognition using BERT-BiLSTM-CRF. IEEE Trans Inf Forensics Secur. 2022; 17(5): 1121-1135.

[8]

Pranav R, Jian Z, Konstantin L, Percy L. SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2016: 2383-2392.

[9]

Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676. 2019.

[10]

Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020; 36(4): 1234-1240.

[11]

Wu C, Chang J. Mesoporous bioactive glasses: structure characteristics, drug/growth factor delivery and bone tissue engineering applications. Interface Focus. 2014; 2(3):20110007.

[12]

Devlin J, Chang MW, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understandin. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (long and short papers). 2019: 4171-4186.

[13]

Li H, Ji H. Incremental joint extraction of entity mentions and relations. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2014: 402-412.

[14]

Hakimi O, Krallinger M, Ginebra M-P. Time to kick-start text mining for biomaterials. Nat Rev Mater. 2022; 5(8): 553-556.

[15]

Wei CH, Kao HY, Lu Z. GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. BioMed Res Int. 2015; 2015: 918710-918717.

[16]

Shi J, Sun M, Sun Z, Li M, Gu Y, Zhang W. Multi-level semantic fusion network for Chinese medical named entity recognition. J Biomed Inf. 2022; 133: 104-144.

[17]

Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. 2007; 160(1): 3-24.

[18]

Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991. 2015.

[19]

Li J, Sun A, Han J, Li C. A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng. 2020; 34(1): 50-70.

[20]

Chen S, Pei Y, Ke Z, Silamu W. Low-resource named entity recognition via the pre-training model. Symmetry. 2021; 13(5):786.

[21]

Liu Y, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. 2019.

[22]

Qin P, Eisner J. Stacking layers to compress sequential tagging models. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); 2019: 5999-6006.

[23]

Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019; 6(1): 1-48.

[24]

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019; 25(1): 44-56.

[25]

Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. 2017.

[26]

Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Inf Process Manag. 2009; 45(4): 427-437.

[27]

Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2016: 260-270.

[28]

Li J, Sun Y, Johnson RJ, et al. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. 2016; 2016:baw068.

[29]

Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. 2018; 22(5): 1589-1604.

[30]

Clark K, Khandelwal U, Levy O, Manning CD. What does BERT look at? An analysis of BERT’s attention. arXiv preprint arXiv:1906.04341. 2019.

[31]

Hu Z, Fang J, An J. A review of the application of artificial intelligence in biomaterials science. Front Mater Sci. 2019; 6:190.

RIGHTS & PERMISSIONS

2025 The Author(s). Materials Genome Engineering Advances published by Wiley-VCH GmbH on behalf of University of Science and Technology Beijing.

AI Summary AI Mindmap
PDF

20

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/