LungDiag: Empowering artificial intelligence for respiratory diseases diagnosis based on electronic health records, a multicenter study

Hengrui Liang , Tao Yang , Zihao Liu , Wenhua Jian , Yilong Chen , Bingliang Li , Zeping Yan , Weiqiang Xu , Luming Chen , Yifan Qi , Zhiwei Wang , Yajing Liao , Peixuan Lin , Jiameng Li , Wei Wang , Li Li , Meijia Wang , Yun Hui Zhang , Lizong Deng , Taijiao Jiang , Jianxing He

MedComm ›› 2025, Vol. 6 ›› Issue (1) : e70043

PDF
MedComm ›› 2025, Vol. 6 ›› Issue (1) : e70043 DOI: 10.1002/mco2.70043
ORIGINAL ARTICLE

LungDiag: Empowering artificial intelligence for respiratory diseases diagnosis based on electronic health records, a multicenter study

Author information +
History +
PDF

Abstract

Respiratory diseases pose a significant global health burden, with challenges in early and accurate diagnosis due to overlapping clinical symptoms, which often leads to misdiagnosis or delayed treatment. To address this issue, we developed LungDiag, an artificial intelligence (AI)-based diagnostic system that utilizes natural language processing (NLP) to extract key clinical features from electronic health records (EHRs) for the accurate classification of respiratory diseases. This study employed a large cohort of 31,267 EHRs from multiple centers for model training and internal testing. Additionally, prospective real-world validation was conducted using 1142 EHRs from three external centers. LungDiag demonstrated superior diagnostic performance, achieving an F1 score of 0.711 for top 1 diagnosis and 0.927 for top 3 diagnoses. In real-world testing, LungDiag outperformed both human experts and ChatGPT 4.0, achieving an F1 score of 0.651 for top 1 diagnosis. The study emphasizes the potential of LungDiag as an effective tool to support physicians in diagnosing respiratory diseases more accurately and efficiently. Despite the promising results, further large-scale multicenter validation with larger sample sizes is still needed to confirm its clinical utility and generalizability.

Keywords

artificial intelligence (AI) / electronic medical records (EHRs) / natural language processing (NLP) / respiratory diseases

Cite this article

Download citation ▾
Hengrui Liang, Tao Yang, Zihao Liu, Wenhua Jian, Yilong Chen, Bingliang Li, Zeping Yan, Weiqiang Xu, Luming Chen, Yifan Qi, Zhiwei Wang, Yajing Liao, Peixuan Lin, Jiameng Li, Wei Wang, Li Li, Meijia Wang, Yun Hui Zhang, Lizong Deng, Taijiao Jiang, Jianxing He. LungDiag: Empowering artificial intelligence for respiratory diseases diagnosis based on electronic health records, a multicenter study. MedComm, 2025, 6(1): e70043 DOI:10.1002/mco2.70043

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Safiri S, Carson-Chahhoud K, Noori M, et al. Burden of chronic obstructive pulmonary disease and its attributable risk factors in 204 countries and territories, 1990–2019: results from the Global Burden of Disease Study 2019. BMJ. 2022; 378: e069679.

[2]

Collaborators GBDAR. Global mortality associated with 33 bacterial pathogens in 2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2022; 400(10369): 2221-2248.

[3]

Global Burden of Disease Long CC, Wulf Hanson S, Abbafati C, et al, Global Burden of Disease Long CC. Estimated global proportions of individuals with persistent fatigue, cognitive, and respiratory symptom clusters following symptomatic COVID-19 in 2020 and 2021. JAMA. 2022; 328(16): 1604-1615.

[4]

Levine SM, Marciniuk DD. Global impact of respiratory disease: what can we do, together, to make a difference?. Chest. 2022; 161(5): 1153-1154.

[5]

Zhang D, Yan B, He S, et al. Diagnostic consistency between admission and discharge of pediatric cases in a tertiary teaching hospital in China. BMC Pediatr. 2023; 23(1): 176.

[6]

Unroe KT, Pfeiffenberger T, Riegelhaupt S, Jastrzembski J, Lokhnygina Y. Colon-Emeric C. Inpatient medication reconciliation at admission and discharge: a retrospective cohort study of age and other risk factors for medication discrepancies. Am J Geriatr Pharmacother. 2010; 8(2): 115-126.

[7]

Gunter TD, Terry NP, The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions. J Med Internet Res. 2005; 7(1): e3.

[8]

Qiu JX, Yoon HJ, Fearn PA, Tourassi GD. Deep learning for automated extraction of primary sites from cancer pathology reports. IEEE J Biomed Health Inform. 2018; 22(1): 244-251.

[9]

Yoon HJ, Peluso A, Durbin EB, et al. Automatic information extraction from childhood cancer pathology reports. JAMIA Open. 2022; 5(2): ooac049.

[10]

Schneider CV, Li T, Zhang D, et al. Large-scale identification of undiagnosed hepatic steatosis using natural language processing. EClinicalMedicine; 2023:102149.

[11]

Yan MY, Gustad LT, Nytro O. Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review. J Am Med Inform Assoc. 2022; 29(3): 559-575.

[12]

Cowie MR, Blomster JI, Curtis LH, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017; 106(1): 1-9.

[13]

Liang J, Li Y, Zhang Z, et al. Adoption of electronic health records (EHRs) in China during the past 10 years: consecutive survey data analysis and comparison of sino-american challenges and experiences. J Med Internet Res. 2021; 23(2): e24813.

[14]

Swanson K, Wu E, Zhang A, Alizadeh AA, Zou J. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell. 2023; 186(8): 1772-1791.

[15]

Ngiam KY, Khor IW, Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019; 20(5): e262-e273.

[16]

Zhao Y, Wood EP, Mirin N, Cook SH, Chunara R. Social determinants in machine learning cardiovascular disease prediction models: a systematic review. Am J Prev Med. 2021; 61(4): 596-605.

[17]

Merkin A, Krishnamurthi R, Medvedev ON. Machine learning, artificial intelligence and the prediction of dementia. Curr Opin Psychiatry. 2022; 35(2): 123-129.

[18]

Yasmin F, Shah SMI, Naeem A, et al. Artificial intelligence in the diagnosis and detection of heart failure: the past, present, and future. Rev Cardiovasc Med. 2021; 22(4): 1095-1113.

[19]

Zheng T, Xie W, Xu L, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017; 97: 120-127.

[20]

Exarchos KP, Goletsis Y, Fotiadis DI, Multiparametric decision support system for the prediction of oral cancer reoccurrence. IEEE Trans Inf Technol Biomed. 2012; 16(6): 1127-1134.

[21]

Saberi-Karimian M, Khorasanchi Z, Ghazizadeh H, et al. Potential value and impact of data mining and machine learning in clinical diagnostics. Crit Rev Clin Lab Sci. 2021; 58(4): 275-296.

[22]

Rufo DD, Debelee TG, Ibenthal A, Negera WG. Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM). Diagnostics (Basel). 2021; 11(9).

[23]

Smit A, Jain S, Rajpurkar P, et al., CheXbert: Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv:200409167. 2020;last revised 18 Oct 2020.

[24]

Wang D, Feng L, Ye J, et al. Accelerating the integration of ChatGPT and other large-scale AI models into biomedical research and healthcare. MedComm—Future Med. 2023; 2: e43.

[25]

Jagannatha AN, Yu H. Structured prediction models for RNN based sequence labeling in clinical text. Proc Conf Empir Methods Nat Lang Process. 2016; 2016: 856-865.

[26]

Deng L, Chen L, Yang T, Liu M, Li S, Jiang T. Constructing high-fidelity phenotype knowledge graphs for infectious diseases with a fine-grained semantic information model: development and usability study. J Med Internet Res. 2021; 23(6): e26892.

[27]

Deng L, Zhang X, Yang T, Liu M, Chen L, Jiang T. PIAT: an evolutionarily intelligent system for deep phenotyping of chinese electronic health records. IEEE J Biomed Health Inform. 2022; 26(8): 4142-4152.

[28]

Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004; 32: D267-70. Database issue.

[29]

Donnelly K. SNOMED-CT: the advanced terminology and coding system for eHealth. Stud Health Technol Inform. 2006; 121: 279-290.

[30]

Alghamdi M, Al-Mallah M, Keteyian S, Brawner C, Ehrman J, Sakr S. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: the Henry Ford ExercIse Testing (FIT) project. PLoS One. 2017; 12(7): e0179805.

[31]

Devlin J, Chang MW, Lee K, Toutanova K, BERT: pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:181004805 2018; (last revised 24 May 2019).

[32]

Yan J, Xu Y, Cheng Q, et al. LightGBM: accelerated genomically designed crop breeding through ensemble learning. Genome Biol. 2021; 22(1): 271.

RIGHTS & PERMISSIONS

2025 The Author(s). MedComm published by Sichuan International Medical Exchange & Promotion Association (SCIMEA) and John Wiley & Sons Australia, Ltd.

AI Summary AI Mindmap
PDF

335

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/