2025, Volume 18 Issue 3

  • Select all
  • ARTICLE
    Li Shen , Chao Gao , Tianrui He , Liting Chu , Jie Wang , Zhenlin Zhang , Guangjun Yu
    2025, 18(3): e70045. https://doi.org/10.1111/jebm.70045

    Objective: The early identification of osteoporosis and vertebral fractures (VFs) is vital for improving the quality of life in elderly men. This study aimed to validate the effectiveness of a self-assessment tool for osteoporosis and VFs primary screening in the elderly men.

    Methods: This real-world study analyzed data from two sources: an electronic health record (EHR) database comprising 7187 subjects and a community database including 6313 subjects. Restricted cubic spline curves were utilized to analyze the relationship between the osteoporosis self-assessment tool for Asians (OSTA) index and the prevalence of osteoporosis, overall VFs, and moderate to severe VFs. Diagnostic performance was assessed by calculating sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), and optimal cutoff values were determined for different age groups.

    Results: With a cutoff value of −1, the OSTA index demonstrated good diagnostic performance for identifying osteoporosis, achieving an AUC of 0.712 (p < 0.001), with sensitivity and specificity of 81.6% and 78.1%, respectively. The screening performance was notably higher among individuals aged 70–79 and those over 80 years, with AUCs of 0.79 and 0.81, respectively, and sensitivities exceeding 90%. For moderate to severe VFs, the OSTA index demonstrated a sensitivity of 86.6%, a specificity of 53.1%, and an AUC of 0.628.

    Conclusions: This large-scale real-world study supports the utility of the OSTA index as a valid tool for the primary screening of osteoporosis and VFs in the elderly men.

  • ARTICLE
    Yuling Cao , Lanya Peng , Yipeng Zhang , Cui Yang
    2025, 18(3): e70053. https://doi.org/10.1111/jebm.70053

    Aim: Biomedical entity linking is essential in natural language processing for identifying and linking biomedical concepts to entities in a knowledge base. Current methods, which involve a multistage recognition-retrieve-read process, achieve high performance but are hindered by slow inference times and error propagation.

    Methods: The authors propose ER2, an End-to-End entity linking paradigm following a Retrieval-Rerank framework. It reversely selects mentions in context and their corresponding entities based on the prior knowledge of candidate entities, enabling jointly performing candidates retrieval, mention detection, and candidates rerank in one pass via a lighten-weight reranker that models deep relevance between the context and its candidates at the embedding level. We further introduce a more powerful cross-encoder as the teacher model, thereby enhancing the rerank performance via knowledge distillation from the teacher to the student reranker.

    Results: Experiments on several end-to-end entity linking benchmarks demonstrate the efficiency and effectiveness. Notably, our method achieves competitive performance compared with the previous state-of-the-art methods while being nearly 10 times faster.

    Conclusions: The research has a significant reference for connecting mentions within unstructured contexts to their corresponding entities in KBs, thereby facilitating the application effect of downstream tasks such as automatic diagnosis, drug–drug interaction prediction and personalized medicine and other fields.

  • ARTICLE
    Jianchang Xie , Lu Ye , Jianmin Yang , Yigang Zhong , Peng Xu , Beibei Gao , Ningfu Wang , Xianhua Ye , Guoxin Tong , Jinyu Huang
    2025, 18(3): e70054. https://doi.org/10.1111/jebm.70054

    Objective: This systematic review and network meta-analysis aimed to compare the accuracy of coronary angiographic-derived fractional flow reserve (Angio-FFR), optical coherence tomography (OCT)-FFR, and intravascular ultrasound (IVUS)-FFR in evaluating the severity of coronary artery stenosis.

    Methods: PubMed, Embase, and Cochrane Library were searched from January 1, 2010 to April 1, 2024 for studies on the accuracy assessment of Angio-FFR, OCT-FFR, and IVUS-FFR. A network meta-analysis was performed with accuracy and analysis of variance models. The diagnostic performance was evaluated through absolute sensitivity (SEN), specificity (SPE), diagnostic dominance index (DDI), and diagnostic odds ratio (DOR), with the corresponding 95% confidence interval (CI).

    Results: The analysis included 86 studies (16,552 lesions). Network meta-analysis showed that IVUS-FFR demonstrated the highest absolute SEN of 0.92 (0.91, 0.94), while OCT-FFR demonstrated the highest absolute SPE of 0.92 (0.91, 0.94). In comparison to Angio-FFR, intracoronary imaging (ICI)-FFR demonstrated superior diagnostic performance, with a DDI and DOR of 1.00 (95% CI: 0.99, 1.01) versus 0.96 (95% CI: 0.94, 0.98) and 79.18 (95% CI: 62.20, 92.35) versus 56.15 (95% CI: 52.86, 59.29), respectively. Furthermore, ICI-FFR demonstrated significantly greater overall accuracy than Angio-FFR, with a relative risk of 1.03 (95% CI: 1.01–1.04).

    Conclusion: This comprehensive network meta-analysis establishes that ICI-FFR provides superior diagnostic performance for coronary artery stenosis assessment compared to Angio-FFR. These findings support the clinical value of ICI modalities in functional stenosis evaluation.

  • ARTICLE
    Jiaxin Zhang , Xiaojian Ji , Lidong Hu , Yiwen Wang , Simin Liao , Jiawen Hu , Yinan Zhang , Lulu Zeng , Shiwei Yang , Jian Zhu , Feng Huang
    2025, 18(3): e70055. https://doi.org/10.1111/jebm.70055

    Objective: The efficacy of sulfasalazine (SSZ) in axial spondyloarthritis (ax-SpA) patients meeting both 2009 ax-SpA and 2011 peripheral SpA (p-SpA) criteria is unclear. This study aimed to assess SSZ's clinical efficacy in pure ax-SpA and overlapping ax-SpA patients and to identify factors influencing peripheral symptom development.

    Methods: SpA patients from 2016 to 2023 at the First Medical Center of People's Liberation Army General Hospital were categorized into pure ax-SpA and ax-SpA with peripheral features. They received nonsteroidal anti-inflammatory drugs (NSAIDs) alone or with SSZ. The study evaluated SSZ's efficacy, peripheral symptom timing and prevalence, and factors affecting symptom onset using Cox regression.

    Results: Of 670 SpA patients, 469 maintained pure axial involvement throughout follow-up, while 201 developed peripheral symptoms during follow-up. had pure ax-SpA. The SSZ plus NSAIDs group demonstrated significantly lower Axial Spondyloarthritis Disease Activity Score than NSAIDs-only in both pure axial (1.2 vs. 1.8; p = 0.015) and axial-with-peripheral subgroups (1.1 vs. 1.8; p = 0.011). New peripheral symptoms occurred in 33.2% of the NSAIDs group and 15.1% of the SSZ plus NSAIDs group. SSZ reduced the risk of peripheral symptom development (hazard ratio [HR] = 0.489, p = 0.0028). In the Cox proportional hazards model adjusted for age, sex, smoking status, body mass index, and Human Leukocyte Antigen B27 status, male gender (HR = 0.64, p = 0.020) and SSZ use (HR = 0.47, p = 0.004) emerged as protective factors, whereas smoking significantly increased risk (HR = 1.97, p < 0.001).

    Conclusions: SSZ reduces disease activity, improves quality of life among ax-SpA patients, reduces the incidence of peripheral symptoms, and delays their onset. About one-third of ax-SpA patients develop peripheral symptoms over time, which are associated with poorer functional status and quality of life. Smoking increases this risk, while male gender and SSZ use offer protection.

  • GUIDELINE
    Haiming Wang , Yi Li , Chengqi He
    2025, 18(3): e70056. https://doi.org/10.1111/jebm.70056

    Osteoporosis (OP) is one of the most serious health problems, causing a huge economic burden to patients, families, and society. OP rehabilitation treatment plays an important role in pain-relieving, reducing the risk of fracture, improving the ability of daily activities, and promoting the healing of OP fractures, and has been increasingly valued and recommended by domestic and foreign guidelines. This guideline was updated based on the Chinese Expert Consensus on Rehabilitation Intervention for Primary OP 2019 editions and was initiated by the Chinese Society of Physical Medicine and rehabilitation and the West China Hospital of Sichuan University. This guideline development was guided by domestic and international guideline development methods and principles and selected through clinical issue screening and deconstruction and two rounds of Delphi questionnaire consultation. The International Classification of Functioning, Disability and Health (ICF) was used as the theoretical framework, and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) was used based on the best available evidence. The guideline was developed based on the best available evidence, used the GRADE method to grade the quality of evidence and recommendations, and reported according to the Reporting Items for Practice Guidelines in Healthcare. Taking into account patients' preferences and values and the needs of Chinese clinical practice, it puts forward 22 recommendations covering seven aspects (rehabilitation assessment, therapeutic modalities, occupational therapy, assistive devices, cognitive behavior and psychological therapy, traditional Chinese medicine therapy, and health education) to systematically standardize OP rehabilitation.

  • ARTICLE
    Yangguang Liu , Jiahuan Li , Yingwen Chen , Lingxiao Li , Ling Zhao , Xiaomei Zhang , Yuli Huang
    2025, 18(3): e70057. https://doi.org/10.1111/jebm.70057

    Background: The impact of intravenous iron therapy in heart failure (HF) patients with iron deficiency (ID) is still controversial.

    Method: We performed an extensive search of electronic databases for pertinent studies, encompassing all records up to March 4, 2024. Using random-effects models in a meta-analysis, the collected outcomes data were then synthesized and analyzed.

    Result: Fourteen trials with 7786 participants (iron therapy: n = 3994; control: n = 3792) were included. Intravenous iron therapy can decrease the risk of composite events of total hospitalization for HF and cardiovascular (CV) death (RR = 0.82 [0.72, 0.92]), total hospitalization for HF (RR = 0.78 [0.66, 0.91]), first hospitalization for HF and CV death (OR = 0.78 [0.65, 0.93]), CV death (OR = 0.86 [0.76, 0.98]), first hospitalization for HF(OR = 0.77 [0.61, 0.99]), but not significantly reduce the risk of all-cause mortality (OR = 0.93 [0.83, 1.04]). Furthermore, intravenous iron treatment can improve the distance of 6-min walking test (6MWT) (WMD = 18.99 [7.41, 30.57]). Subgroup analyses found that intravenous iron may be more beneficial in HF patients with transferrin saturation (TSAT) <20%, and those with ischemic heart disease. Meta-regression analysis revealed that baseline hemoglobin levels served as a significant moderator of the therapeutic efficacy of intravenous iron supplementation.

    Conclusion: For HF patients with ID, intravenous iron therapy can decrease the risk of hospitalization for HF, CV death and improve their exercise capacity. Patients with ischemic cardiomyopathy or with TSAT <20% may derive greater benefit from intravenous iron therapy.

  • ARTICLE
    Yanjiao Shen , Parpia Sameer , Xin Xia , Yuqing Zhang , Jinhui Ma , Qingyang Shi , Qiukui Hao , Xianlin Gu , Wenbo He , Yamin Chen , Na Zhang , Le Wang , Yating Zeng , Xiaoyi Su , Qiang Zong , Qiao Zhi , Sitong Liu , Xinyao Wang , Xinyu Zou , Ying He , Qiong Guo , Borong Wang , Liang Du , Zhengchi Li , Jin Huang , Guyatt Gordon
    2025, 18(3): e70058. https://doi.org/10.1111/jebm.70058

    Aim: To summarize the optimal strategies for dealing with missing binary outcome data (MBOD) in randomized controlled trials (RCTs) as informed by simulation studies, and to summarize the quality of reporting in these studies.

    Methods: To identify simulation studies comparing at least two strategies to deal with MBOD and evaluating their performance (bias, coverage and power), we searched MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials via Ovid, Web of Science, and JSTOR from their inception up to December 20, 2023. We evaluated reporting quality using established criteria for simulation studies in medical statistics. We summarized data using descriptive statistics and a narrative synthesis.

    Results: Our search identified 29,460 citations, of which five proved eligible. Multiple imputation (MI), investigated in five studies, showed consistently good performance in all domains tested for missing completely at random (MCAR) and missing at random (MAR) but with important limitations in missing not at random (MNAR). Complete case analysis (CCA), investigated in four studies of which three addressed model-based CCA, performed well in bias and coverage under MAR and MCAR, but less well for MNAR. One study reported that non-model-based CCA performed poorly with respect to bias under MAR. Non-model-based single imputation, investigated in two studies, showed consistently poor performance across all domains tested for MAR, MCAR and MNAR. One study reported that model-based single imputation performed well with respect to bias under MAR. Regarding reporting quality, all studies reported the aims, dependence of simulated data sets, scenarios and statistical methods evaluated, number of simulations performed, justification of data generation and criteria used to evaluate the simulation performance. None of the studies reported the starting seeds, random number generators and failures occurring during simulation.

    Conclusions: Simulation studies address methods to deal with MBOD in RCTs, provided evidence that the MI approach is superior with respect to bias and coverage compared with CCA. Non-model-based single imputation generally performed poorly.

  • ARTICLE
    Mingsheng Sun , Chaorong Xie , Yanan Wang , Xuguang Yang , Linlin Dong , Taipin Guo , Xiaoqin Chen , Jing Luo , Yutong Zhang , Xixiu Ni , Lu Liu , Jiao Chen , Siyuan Zhou , Ling Zhao
    2025, 18(3): e70059. https://doi.org/10.1111/jebm.70059

    Objective: Acupuncture is recognized as an effective migraine treatment, but the comparative long-term efficacy of different acupuncture methods at identical acupoints remains unclear. This study investigates the prophylactic effects of manual acupuncture (manual penetrating acupuncture, MPA) versus sham acupuncture (non-penetrating acupuncture, NPA) at the same acupoints.

    Methods: In this multicenter, single-blind randomized controlled trial conducted across four Chinese clinical centers (May 2020 to September 2022), 192 migraineurs without aura (International Classification of Headache Disorders 3rd edition criteria) were randomized 1:1 to 12 sessions of MPA or NPA. Primary outcome was the change from baseline in migraine attack frequency at week 16; secondary outcomes included migraine attack frequency, responder rates, migraine days, and pain intensity (every 4 weeks), etc. Trial registration: No. ChiCTR2000032308.

    Results: A total of 198 participants were randomly allocated to either MPA or NPA groups, 99 in each group. At 16 weeks, the change in MPA showed a greater reduction in migraine attacks versus NPA (mean difference [MD] = –0.6, 95% confidence interval [CI] –1.5 to 0.05; p = 0.069). MPA demonstrated superior responder rates (risk difference = 17.2%, 95% CI 5.2 to 29.1; p = 0.007) and pain reduction (MD = –0.6, 95% CI –1.1 to –0.2; p = 0.003) after treatment. At follow-up, MPA improved all migraine symptoms and some quality of life compared with NPA. Adverse events occurred in 5.1% of MPA participants.

    Conclusions: Although MPA and NPA showed comparable preventive effects, MPA provided sustained symptom relief and quality-of-life improvements. Therefore, suitable acupoint selection establishes therapeutic potential, whereas acupuncture methods critically determine long-term clinical benefits.

  • ARTICLE
    Guoqiang Liu , Peng Liu , Sijun Yu , Xia Liu , Fan Wang , Chunhui Liu , Xiaozhen Hu , Yongquan Jing , Linqiang Liu , Xuxia Zhang , Yuzeng Xue , Guanzhong Zheng , ChangYu Wang , Zhongming Zhao , Yanjie Zheng , Wenzhai Cao , Huanyi Zhang , Feng Gao , Jing Zhou , Zidong Bie , Guoqiang Yuan , Lei Wang , Jun Qian , Xiaochen Tian , Haitao Zhang , Xiangdong Li , Zhenhua Jia , Ningxin Ding , Yuejin Yang
    2025, 18(3): e70060. https://doi.org/10.1111/jebm.70060

    Aim: Tongxinluo (TXL) capsule, a traditional Chinese medicine compound, has proven effective in acute myocardial infarction (AMI), but its cost-effectiveness is unclear.

    Methods: This economic evaluation utilized individual data from clinical trials to compare major adverse cardiac and cerebrovascular events (MACCEs) at 30 days and quality-adjusted life years (QALYs) at 1 year between an intervention group (TXL combined with conventional therapy) and the control group (placebo plus conventional therapy), from a healthcare perspective. A lifetime cost-utility analysis (CUA) was conducted using a Markov model, and sensitivity analyses were performed to evaluate the robustness of the results.

    Results: A total of 3777 patients (TXL: 1889; placebo: 1888) were included in the analysis. The 30-day total costs for the TXL and placebo groups were ¥38,561 ($5399) and ¥39,217 ($5490), respectively, showing no statistical difference. The 30-day MACCEs rates were 3.39% for the TXL group and 5.24% for the placebo group (p < 0.006), indicating TXL's superiority in effectiveness at 30 days. Over a lifetime, the TXL group incurred higher total costs (¥97,108 [$13,595] vs. ¥92,033 [$12,885]) and gained more QALYs (6.70 vs. 6.30). The incremental cost-effectiveness ratio for TXL was ¥12,421/QALY ($1739), below the 1 Gross Domestic Product per capital threshold which was ¥89,358 ($12,510) in 2023 in China. Sensitivity analysis confirmed robust results, revealing that TXL was more likely to be accepted over the placebo when the willingness to pay exceeds ¥12,500 ($1739).

    Conclusions: TXL is a cost-effective option compared to placebo in AMI.

  • ARTICLE
    Sha Diao , Yuan Feng , Xue Peng , Dan Liu , Liang Huang , Linan Zeng , Lingli Zhang
    2025, 18(3): e70061. https://doi.org/10.1111/jebm.70061

    Objective: To evaluate zinc supplementation's efficacy in pregnancy, addressing gaps in previous reviews regarding high-risk subgroups and combination therapies.

    Methods: Systematic review of six databases through March 27, 2025 for randomized controlled trials (RCTs) on prenatal zinc supplementation. Risk of bias was assessed using the Cochrane Risk of Bias 2. Stratified analyses was conducted by participant or intervention characteristics, with meta-analysis or qualitative synthesis when appropriate. Sensitivity analyses was conducted by excluding studies with high risk of bias. The systematic review was registered in PROSPERO (CRD42023440314).

    Results: 77 RCTs were included. Compared with no zinc, zinc monotherapy among healthy pregnant women resulted in higher serum zinc level (standard mean difference (SMD) the second trimester = 0.32, 95% confidence interval (CI) 0.20 to 0.44; SMDthe third trimester = 0.51, 95% CI 0.27 to 0.76), lower fetal intrauterine retardation rate (risk ratio = 0.23, 95% CI 0.16 to 0.35), longer neonatal birth length (SMD = 0.66, 95% CI 0.21 to 1.12), bigger birth head circumference (SMD = 0.58, 95% CI 0.08 to 1.09), higher 1-min Apgar score (SMD = 0.28, 95% CI 0.06 to 0.49) and cord blood zinc level (SMD = 0.36, 95% CI 0.17 to 0.56). No additional benefits observed with zinc-iron-folate combinations versus iron-folate alone. Qualitative synthesis of limited evidence suggested potential benefits for high-risk groups (anemia, gestational diabetes, zinc deficiency or impaired intravenous glucose tolerance test).

    Conclusions: Zinc monotherapy may benefit healthy pregnancies and high-risk groups, but combination regimens show no additional advantages. Further research should confirm these findings.

  • ARTICLE
    Xin Xing , Guohua Zhang , Weize Kong , Liping Guo , Xiuxia Li , Zhipeng Wei , Yongbin Lu , Howard White , Yaolong Chen , Kehu Yang
    2025, 18(3): e70062. https://doi.org/10.1111/jebm.70062

    Objective: While prior investigations into the reporting of health economics (HE) have predominantly focused on guideline analyses at singular institutional or national levels, this study extends its scope to encompass diverse guidance documents issued transnationally across multiple institutions. Specifically, we evaluated the reporting of HE evidence in international clinical practice guidelines (CPGs) and expert consensus statements published between 2021 and 2023. The findings aim to inform the future revisions and development of such documents.

    Methods: A systematic PubMed search identified relevant CPGs and expert consensus statements within a specified period. Two independent researchers screened the literature, extracted economic evidence integrated into these documents, and employed descriptive analysis to summarize the reporting characteristics.

    Results: Of the 8931 screened publications, 3119 (34.9%) reported HE evidence. Among these 3119 publications, 237 (7.6%) incorporated HE evidence in formulating recommendations, 220 (7.1%) utilized HE evidence for evidence grading, and 2581 (82.8%) referenced HE evidence in explanatory notes accompanying the recommendations.

    Conclusions: Current CPGs and expert consensus statements exhibit low rates of HE evidence reporting, indicating that most international guideline developers have overlooked their applications. HE evidence—through cost-effectiveness and cost-utility analyses—can optimize medical resource allocation, support clinicians in patient-centered economic decision-making, and enhance health outcomes. Future guideline development should prioritize HE evidence integration to advance the scientific rigor and clinical applicability of the recommendations.

  • METHODOLOGY
    Luis Furuya-Kanamori , Xanthoula Rousou , Polychronis Kostoulas , Suhail A. R. Doi
    2025, 18(3): e70063. https://doi.org/10.1111/jebm.70063

    Systematic reviews and meta-analyses are considered the highest level of evidence, but their reliability can be undermined by publication bias. Traditional methods for assessing publication bias, such as funnel plots and p-value-based tests (e.g., Egger test), have notable limitations, including reliance on subjective interpretation and dependence on the number of studies included in a meta-analysis (k). The Doi plot and LFK index offer promising alternatives, providing improved visualization and quantification of plot asymmetry. This study revisits the application of the Doi plot and LFK index for detecting publication bias, addresses recent criticisms, and evaluates their performance compared to p-value-based methods using simulation study. Simulations included scenarios with varying study numbers (k = 5, 10, 20, 50), study sample sizes (small, large), and simulated bias level (ρ = 0, –0.3, –0.5, –0.9) generated using the Copas selection model. Diagnostic performance metrics (i.e., sensitivity and specificity) were estimated and compared for the LFK index and Egger test. The LFK index exhibited consistent higher sensitivity across varying k and simulated bias levels. In contrast, the Egger test was highly dependent on k, with sensitivity declining sharply in small meta-analyses (k < 20). Specificity of the LFK index adjusted with random error, while Egger test specificity remained fixed at ∼90%. The Doi plot and LFK index effectively address the limitations of traditional methods, offering robust k-independent performance and more reliable detection of publication bias. These findings support a transition to the Doi plot and LFK index for publication bias assessment in meta-analyses.

  • ARTICLE
    Shali Hao , Xiaomei Zhang , Lingxiao Li , Libin Mo , Yangguang Liu , Jiahuan Li , Wenli Wang , Jiandi Wu , Yuli Huang
    2025, 18(3): e70064. https://doi.org/10.1111/jebm.70064

    Aim: The white-coat hypertension (WCH) detection by monitoring the out-of-office blood pressure (BP) consumes resources and time. This study aimed at developing the prediction model based on patients’ characteristics obtained from clinical data.

    Methods: Individuals who participated in two large hospitals health check-up examination were screened. Participants with twice readings of elevated office blood pressure in different visits, while no history of hypertension were included. Combination with home blood pressure monitoring, participants were defined as having WCH or sustained hypertension (SH), respectively. Independent predictors were found by employing multivariate logistic regression on training set. A nomogram was built using independent predictors.

    Results: In total, 383 outpatients with elevated office blood pressure were enrolled. Two hundred and thirty-three of them from one hospital were included for the development of the prediction model (training sets), and 150 patients from another independent study site were included for external validation (external validation sets). We identified six predictors including office systolic blood pressure, body mass index, sex, total cholesterol, homocysteine, and heart rate being linked to WCH diagnosis. Area under receiver operating characteristic curve (AUC) for the model was 0.792 and 0.692 regarding training and external validation sets, respectively. The calibration curve and decision curve analyses further demonstrated that the model had good performance for distinguishing WCH from SH.

    Conclusions: This prediction model can help clinicians to identify WCH individuals from those with SH, providing an effective tool for guiding personalized recommendations of abnormal blood pressure management.

  • LETTER
    Shitong Xie , Jiajun Yan , Preston Tse , Brittany Humphries , Feng Xie
    2025, 18(3): e70065. https://doi.org/10.1111/jebm.70065
  • ARTICLE
    Weilong Zhao , Danni Xia , Ziying Ye , Honghao Lai , Mingyao Sun , Jiajie Huang , Jiayi Liu , Jianing Liu , Long Ge
    2025, 18(3): e70067. https://doi.org/10.1111/jebm.70067

    Background: Formulating evidene-based recommendations for practice guidelines is a complex process that requires substantial expertise. Artificial intelligence (AI) is promising in accelerating the guideline development process. This study evaluates the feasibility of leveraging five large language models (LLMs)—ChatGPT-3.5, Claude-3 sonnet, Bard, ChatGLM-4, Kimi chat—to generate recommendations based on structured evidence, assesses their concordance, and explores the potential for AI.

    Methods: The general and specific prompts were drafted and validated. We searched PubMed to include evidence-based guidelines related to health and lifestyle. We randomly selected one recommendation from every included guideline as the sample and extracted the evidence base supporting the selected recommendations. The prompts and evidence were fed into five LLMs to generate structured recommendations.

    Results: ChatGPT-3.5 demonstrated the highest proficiency in comprehensively extracting and synthesizing evidence to formulate novel insights. Bard consistently adhered to existing guideline principles, aligning its algorithm with these tenets. Claude generated fewer topical recommendations, focusing instead on evidence analysis and mitigating irrelevant information. ChatGLM-4 exhibited a balanced approach, combining evidence extraction with adherence to guideline principles. Kimi showed potential in generating concise and targeted recommendations. Among the six generated recommendations, average consistency ranged from 50% to 91.7%.

    Conclusion: The findings of this study suggest that LLMs hold immense potential in accelerating the formulation of evidence-based recommendations. LLMs can rapidly and comprehensively extract and synthesize relevant information from structured evidence, generating recommendations that align with the available evidence.

  • ARTICLE
    Hu Zhenyu , Haizhou Xiang , Zeng Ziran , Wu Jiali , Liu Li , Tang Jianwen , Long Menghong , Wang Maohua
    2025, 18(3): e70068. https://doi.org/10.1111/jebm.70068

    Background: Pediatric preoperative anxiety (PPA) is a prevalent condition that exhibits significant effects on the psychological and physiological status of children both preoperatively and postoperatively.

    Methods: We conducted systematic review and network meta-analysis. PubMed, Embase, Web of Science Core Collection, and The Cochrane Library were searched up to December 1, 2024. RCTs of pediatric patients (0–14 years) receiving preoperative sedatives were included. Primary outcome was Parental Separation Anxiety Scale (PSAS); secondary outcomes were Mask Acceptance Scale (MAS), postoperative nausea/vomiting (PONV), and delirium/agitation (PODA).

    Results: Seventy studies (16,626 participants) were included. Five sedatives including midazolam, dexmedetomidine, ketamine (oral, intranasal, nebulized), clonidine (oral, intranasal), and melatonin (oral) were compared with placebo. Data from 20 interventions (5581 patients) assessed PPA. Intranasal dexmedetomidine (ID) showed highest single-drug efficacy (SUCRA: PSAS 68.1%, MAS 48.8%, PONV 65.7%, PODA 67.8%). Oral ketamine (OK) and midazolam (OM/IM) were effective alternatives. Combined regimens were promising but inconclusive.

    Conclusions: ID significantly alleviated PPA with minimal adverse effects in single-drug regimens (optimal dose: 1–2 µg/kg). OK, OM or IM served as potential alternative options for clinical application. While combination regimens (notably OM+OK) demonstrated superior efficacy across outcomes, small sample sizes necessitate cautious interpretation, underscoring the need for future comparative studies.

  • LETTER
    Huai Heng Loh , Siow Phing Tay , Ai Jiun Koa , Mei Ching Yong , Asri Said , Chee Shee Chai , Natasya Marliana Abdul Malik , Anselm Ting Su , Bonnie Bao Chee Tang , Florence Hui Sieng Tan , Norlela Sukor
    2025, 18(3): e70071. https://doi.org/10.1111/jebm.70071
  • ARTICLE
    Lanwei Guo , Jiani Yuan , Lin Cai , Chenxin Zhu , Yan Zheng , Haiyan Yang , Yanyan Liu
    2025, 18(3): e70072. https://doi.org/10.1111/jebm.70072

    Objective: China faces a significant burden of gastrointestinal tumors driven by socioeconomic, environmental, and lifestyle factors. Using GBD2021 data, this study analyses epidemiological trends and disease burden for six major gastrointestinal tumor cancers (esophagus, gastric, colorectum, liver, pancreas, gallbladder and biliary tract) in China (1990–2021).

    Methods: The GBD 2021 was used to extract the incidence, mortality, and disability-adjusted life years (DALYs) data of gastrointestinal tumors in China. Age-standardized rates (ASRs) and 95% uncertainty intervals (UIs) were calculated. Temporal trends were assessed by joinpoint regression analysis, and average annual percent change (AAPC) and annual percentage change (APC) were calculated and analyzed stratified by gender and age group.

    Results: In 2021, China recorded 1.96 million new gastrointestinal cancer cases, with 1.35 million deaths and 33.07 million DALYs. Gastric cancer led in mortality, and colorectal cancer demonstrated the most rapid incidence growth (AAPC = 1.68). Significant reductions were observed in gastric cancer age-standardized mortality rates (ASMR) (AAPC = –2.44) and esophageal cancer age-standardized disability-adjusted life year rates (ASDR) (AAPC = –2.31). Gender disparities were particularly pronounced in esophageal cancer, with the male-to-female mortality ratio (M/F) escalating from 2.50 (1990) to 4.12 (2021). The age group with the highest mortality burden was 70–74, while the age group with the most significant loss of DALYs was 65–69.

    Conclusion: China has significantly reduced gastrointestinal cancer burden, but gender and age disparities persist, necessitating targeted interventions. Future efforts should focus on tertiary prevention for high-risk groups, especially males and the elderly, while enhancing molecular subtyping and regional data stratification for precision cancer control.

  • ARTICLE
    Fen-Fen Li , Ke Han , Zi-Yue Fu , Bing-Yu Liang , Yan-Xun Han , Yu-Chen Liu , Ye-Hai Liu , Bu-Sheng Tong , Hai-Feng Pan
    2025, 18(3): e70073. https://doi.org/10.1111/jebm.70073

    Aim: Climate change has intensified the prevalence of chronic diseases, particularly autoimmune diseases (ADs), which severely affect the health and labor market participation of the working-age population. While ADs are not typically fatal, their chronic nature and high disability rates lead to significant labor force attrition. This study explores the impact of ADs on the labor market, particularly in regions affected by climate change.

    Methods: This study integrates labor market data with re-estimated ADs burden data from 1990 to 2021. Using time series analysis, multivariate regression, and geographic variation analysis, the research examines the relationship between ADs burden and labor force participation, with a focus on the exacerbating effects of climate change. Data was sourced from the International Labour Organization (ILO) and the Global Burden of Disease (GBD) database.

    Results: In 2021, the global labor force with ADs was 86,295,350, with a prevalence rate of 1644.55 per 100,000. Women had a significantly higher prevalence (1841.96 per 100,000) compared to men (1448.6 per 100,000). The total disability-adjusted life Years (DALYs) for ADs was 18,513,645 person-years, with women experiencing higher DALYs (386.3 per 100,000). Regions severely affected by climate change showed increased ADs prevalence and a decline in labor force participation.

    Conclusion: ADs significantly contribute to global labor market decline, with climate change amplifying the health burden. Gender disparities are notable, and ADs' impact on labor force participation highlights the need for comprehensive public health policies and labor market interventions.

  • LETTER
    Jie Hao , Zixuan Yao , Andréas Remis , Xin Yu
    2025, 18(3): e70074. https://doi.org/10.1111/jebm.70074
  • REVIEW
    Fengxian Chen , Yan Li , Yaolong Chen , Zhaoxiang Bian , La Duo , Qingguo Zhou , Lu Zhang
    2025, 18(3): e70075. https://doi.org/10.1111/jebm.70075

    The application of artificial intelligence (AI) in healthcare has become increasingly widespread, showing significant potential in assisting with diagnosis and treatment. However, generative AI (GAI) models often produce “hallucinations”—plausible but factually incorrect or unsubstantiated outputs—that threaten clinical decision-making and patient safety. This article systematically analyzes the causes of hallucinations across data, training, and inference dimensions and proposes multi-dimensional strategies to mitigate them. Our findings reveal three critical conclusions: The technical optimization through knowledge graphs and multi-stage training significantly reduces hallucinations, while clinical integration through expert feedback loops and multidisciplinary workflows enhances output reliability. Additionally, implementing robust evaluation systems that combine adversarial testing and real-world validation substantially improves factual accuracy in clinical settings. These integrated strategies underscore the importance of harmonizing technical advancements with clinical governance to develop trustworthy, patient-centric AI systems.

  • LETTER
    Chengliang Zhong , Shengxuan Guo , Yimin Yang , Aizhen Wang , Zheng Xue , Mengqing Wang , Guihua Song , Kun Yang , Hai Wang , Wei Zhong , Ya Gao , Zhigang Liu , Minghui Wang , Yuyan Chen , Xinmin Li , Siyuan Hu
    2025, 18(3): e70077. https://doi.org/10.1111/jebm.70077
  • ARTICLE
    Bin Ma , Yuanmin Jia , Haixia Wang , Ou Chen
    2025, 18(3): e70078. https://doi.org/10.1111/jebm.70078

    Objective: To evaluate and compare the effectiveness of adherence-enhancement strategies in patients with chronic kidney disease (CKD).

    Methods: Nine databases (PubMed, Embase, The Cochrane Library, Web of Science, Scopus, CNKI, VIP, WanFang, and CBM) were searched for randomized controlled trials (RCTs) to April 1, 2025. Two reviewers independently screened, extracted data, and assessed risk of bias with the Cochrane Risk of Bias 2.0 tool. Certainty of evidence was appraised using the Confidence in Network Meta-Analysis (CINeMA) tool. Network meta-analysis was performed, and surface under the cumulative ranking curve (SUCRA) was calculated to rank interventions. The review was registered in PROSPERO (CRD42024604771).

    Results: Thirty-five RCTs with 5084 patients were included. Evidence quality was limited by high risk of bias and low certainty. For medication adherence, education plus phone follow-up with short message service education showed the greatest effect (standardized mean differences [SMD] = 2.15, 95% confidence interval [CI] 1.09–3.21; SUCRA = 98.7), followed by empowerment (SMD = 1.19, 95% CI 0.20–2.18; SUCRA = 82.1). For diet adherence, education with phone follow-up was most effective (SMD = 6.68, 95% CI 5.64–7.71; SUCRA = 100), with empowerment also beneficial (SMD = 1.83, 95% CI 1.23–2.43; SUCRA = 87.9). For fluid adherence, education with medication management and pharmacist follow-up was most effective on scale-based outcomes (SMD = 3.06, 95% CI 2.18–3.94; SUCRA = 99.8), while cognitive behavioral therapy reduced interdialytic weight gain (SMD = −0.76, 95% CI −1.30 to −0.22; SUCRA = 71.1).

    Conclusions: Adherence-enhancement strategies improve medication, diet, and fluid adherence in CKD. High-quality RCTs are needed to confirm these findings.