Large language models for bioinformatics

Wei Ruan , Yanjun Lyu , Jing Zhang , Jiazhang Cai , Peng Shu , Yang Ge , Yao Lu , Shang Gao , Yue Wang , Peilong Wang , Lin Zhao , Tao Wang , Yufang Liu , Luyang Fang , Ziyu Liu , Zhengliang Liu , Yiwei Li , Zihao Wu , Junhao Chen , Hanqi Jiang , Yi Pan , Zhenyuan Yang , Jingyuan Chen , Shizhe Liang , Wei Zhang , Terry Ma , Yuan Dou , Jianli Zhang , Xinyu Gong , Qi Gan , Yusong Zou , Zebang Chen , Yuanxin Qian , Shuo Yu , Jin Lu , Kenan Song , Xianqiao Wang , Andrea Sikora , Gang Li , Xiang Li , Quanzheng Li , Yingfeng Wang , Lu Zhang , Yohannes Abate , Lifang He , Wenxuan Zhong , Rongjie Liu , Chao Huang , Wei Liu , Ye Shen , Ping Ma , Hongtu Zhu , Yajun Yan , Dajiang Zhu , Tianming Liu

Quant. Biol. ›› 2026, Vol. 14 ›› Issue (1) : e70014

PDF (1128KB)
Quant. Biol. ›› 2026, Vol. 14 ›› Issue (1) : e70014 DOI: 10.1002/qub2.70014
REVIEW ARTICLE

Large language models for bioinformatics

Author information +
History +
PDF (1128KB)

Abstract

With the rapid advancements in large language model technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we high-light emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.

Keywords

bioinformatics-specific language models / biological systems / biomedical AI / large language models / life active factors

Cite this article

Download citation ▾
Wei Ruan, Yanjun Lyu, Jing Zhang, Jiazhang Cai, Peng Shu, Yang Ge, Yao Lu, Shang Gao, Yue Wang, Peilong Wang, Lin Zhao, Tao Wang, Yufang Liu, Luyang Fang, Ziyu Liu, Zhengliang Liu, Yiwei Li, Zihao Wu, Junhao Chen, Hanqi Jiang, Yi Pan, Zhenyuan Yang, Jingyuan Chen, Shizhe Liang, Wei Zhang, Terry Ma, Yuan Dou, Jianli Zhang, Xinyu Gong, Qi Gan, Yusong Zou, Zebang Chen, Yuanxin Qian, Shuo Yu, Jin Lu, Kenan Song, Xianqiao Wang, Andrea Sikora, Gang Li, Xiang Li, Quanzheng Li, Yingfeng Wang, Lu Zhang, Yohannes Abate, Lifang He, Wenxuan Zhong, Rongjie Liu, Chao Huang, Wei Liu, Ye Shen, Ping Ma, Hongtu Zhu, Yajun Yan, Dajiang Zhu, Tianming Liu. Large language models for bioinformatics. Quant. Biol., 2026, 14(1): e70014 DOI:10.1002/qub2.70014

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Devlin J , Chang M-W , Lee K , Toutanova K . BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, Minneapolis, Minnesota, 1; Annu Rev Plant Biol. 2019. p. 2.

[2]

Brown T , Mann B , Ryder N , Subbiah M , Kaplan JD , Dhariwal P , et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020; 33: 1877- 901.

[3]

Liu J , Yang M , Yu Y , Xu H , Li K , Zhou X . Large language models in bioinformatics: applications and perspectives. 2024. Preprint at arXiv: 2401.04155v1.

[4]

Sarumi OA , Heider D . Large language models and their applications in bioinformatics. Comput Struct Biotechnol J. 2024; 23: 3498- 505.

[5]

Tripathi S , Gabriel K , Tripathi PK , Kim E . Large language models reshaping molecular biology and drug development. Chem Biol Drug Des. 2024; 103 (6): e14568.

[6]

[7]

Lin Z , Zhang L , Wu Z , Chen Y , Dai H , Yu X , et al. When brain-inspired AI meets AGI. 2023. Preprint at arXiv: 2303.15935.

[8]

Zhong T , Liu Z , Pan Y , Zhang Y , Zhou Y , Liang S , et al. Evaluation of openAI o1:opportunities and challenges of AGI. 2024. Preprint at arXiv: 2409.18486.

[9]

Ma C , Wu Z , Wang J , Xu S , Wei Y , Liu Z , et al. An iterative optimizing framework for radiology report summarization with chatGPT. IEEE Transactions on Artificial Intelligence. 2024; 5 (8): 4163- 75.

[10]

Dai H , Liu Z , Liao W , Huang X , Wu Z , Lin Z , et al. ChatAug: leveraging chatGPT for text data augmentation. 2023. Preprint at arXiv: 2302.13007.

[11]

Liu Z , Li Y , Peng S , Zhong A , Yang L , Ju C , et al. Radiology-LLaMA2: best-in-class large language model for radiology. 2023. Preprint at arXiv: 2309.06419.

[12]

Liao W , Liu Z , Dai H , Xu S , Wu Z , Zhang Y , et al. Differentiating chatGPT-generated and human-written medical texts:quantitative study. JMIR Medical Education. 2023; 9 (1): e48904.

[13]

Liu Z , He X , Liu L , Liu T , Zhai X . Context matters: a strategy to pre-train language model for science education. 2023. Preprint at arXiv: 2301.12031.

[14]

Rezayi S , Dai H , Liu Z , Wu Z , Hebbar A , Burns AH , et al. ClinicalRadioBERT: knowledge-infused few shot learning for clinical notes named entity recognition. In: Machine learning in medical imaging:13th international workshop, MLMI 2022, held in conjunction with MICCAI 2022, Singapore, September 18, 2022, proceedings. Springer; 2022. p. 269- 78.

[15]

Dai H , Li Y , Liu Z , Lin Z , Wu Z , Song S , et al. AD-autoGPT: an autonomous GPT for Alzheimer's disease infodemiology. 2023. Preprint at arXiv: 2306.10095.

[16]

Zhao H , Qian L , Pan Y , Zhong T , Hu J-Y , Yao J , et al. Ophtha-LLaMA2: a large language model for ophthalmology. 2023. Preprint at arXiv: 2312.04906.

[17]

Zhang K , Zhou R , Adhikarla E , Yan Z , Liu Y , Yu J , et al. A generalist vision-language foundation model for diverse biomedical tasks. Nat Med. 2024; 30 (11): 1- 13.

[18]

Liu Z , Wang P , Li Y , Holmes J , Peng S , Zhang L , et al. RadOnc-GPT: a large language model for radiation oncology. 2023. Preprint at arXiv: 2309.10160.

[19]

Liu Z , Wang P , Li Y , Holmes JM , Peng S , Zhang L , et al. Fine-tuning large language models for radiation oncology, a highly specialized healthcare domain. International Journal of Particle Therapy. 2024; 12: 100428.

[20]

Lyu Y , Wu Z , Zhang L , Zhang J , Li Y , Ruan W , et al. GP-GPT: large language model for gene-phenotype mapping. 2024. Preprint at arXiv: 2409.09825.

[21]

Wang J , Jiang H , Liu Y , Ma C , Zhang X , Pan Y , et al. A comprehensive review of multimodal large language models:performance and challenges across different tasks. 2024. Preprint at arXiv: 2408.01319.

[22]

[23]

Liu Z , Zhang L , Wu Z , Yu X , Cao C , Dai H , et al. Surviving chatGPT in healthcare. Frontiers in Radiology. 2024; 3: 1224682.

[24]

Huang Y , Sun L , Wang H , Wu S , Zhang Q , Yuan L , et al. TrustLLM: trustworthiness in large language models. 2024. Preprint at arXiv: 2401.05561.

[25]

Yang Z , Liu Z , Zhang J , Lu C , Jiaxin T , Zhong T , et al. Analyzing nobel prize literature with large language models. 2024. Preprint at arXiv: 2410.18142.

[26]

Wang J , Shi E , Yu S , Wu Z , Ma C , Dai H , et al. Prompt engineering for healthcare: methodologies and applications. 2023. Preprint at arXiv: 2304.14670.

[27]

Liu Z , Zhong A , Li Y , Yang L , Ju C , Wu Z , et al. Tailoring large language models to radiology: a preliminary approach to LLM adaptation for a highly specialized domain. In: International workshop on machine learning in medical imaging. Springer. 2023. p. 464- 73.

[28]

Jie T , Hou J , Wu Z , Peng S , Liu Z , Xiang Y , et al. Assessing large language models in mechanical engineering education: a study on mechanics-focused conceptual understanding. 2024. Preprint at arXiv: 2401.12983.

[29]

Lee G-G , Shi L , Latif E , Gao Y , Bewersdorf A , Nyaaba M , et al. Multimodality of AI for education: towards artificial general intelligence. 2023. Preprint at arXiv: 2312.06037.

[30]

Peng S , Zhao H , Jiang H , Li Y , Xu S , Pan Y , et al. LLMs for coding and robotics education. 2024. Preprint at arXiv: 2402.06116.

[31]

Latif E , Mai G , Nyaaba M , Wu X , Liu N , Lu G , et al. Artificial general intelligence (AGI) for education. 2023. Preprint at arXiv: 2304.12479,1.

[32]

Wang J , Wu Z , Li Y , Jiang H , Peng S , Shi E , et al. Large language models for robotics:opportunities, challenges, and perspectives. 2024. Preprint at arXiv: 2401.04334.

[33]

Liu Y , He H , Han T , Zhang X , Liu M , Tian J , et al. Understanding LLMs: a comprehensive overview from training to inference. 2024. Preprint at arXiv: 2401.02038.

[34]

Latif E , Zhou Y , Guo S , Gao Y , Shi L , Nayaaba M , et al. A systematic assessment of openAI o1-preview for higher order thinking in education. 2024. Preprint at arXiv: 2410.21287.

[35]

Xiang L , Lin Z , Zhang L , Wu Z , Liu Z , Jiang H , et al. Artificial general intelligence for medical imaging analysis. IEEE Reviews in Biomedical Engineering. 2024.

[36]

Wang P , Holmes J , Liu Z , Chen D , Liu T , Shen J , et al. A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options. 2024. Preprint at arXiv: 2412.10622.

[37]

Ding Z , Liu Z , Jiang H , Gao Y , Zhai X , Liu T , et al. Foundation models for low-resource language education (vision paper). 2024. Preprint at arXiv: 2412.04774.

[38]

Chen J , Peng S , Li Y , Zhao H , Jiang H , Pan Y , et al. Queen: a large language model for quechua-English translation. 2024. Preprint at arXiv: 2412.05184.

[39]

Zhang Y , Pan Y , Zhong T , Dong P , Xie K , Liu Y , et al. Potential of multimodal large language models for data mining of medical images and free-text reports. Meta-Radiology. 2024; 2 (4): 100103.

[40]

Zhong T , Yang Z , Liu Z , Zhang R , Liu Y , Sun H , et al. Opportunities and challenges of large language models for low-resource languages in humanities research. 2024. Preprint at arXiv: 2412.04497.

[41]

Jiang H , Pan Y , Chen J , Liu Z , Zhou Y , Peng S , et al. OracleSage:towards unified visual-linguistic understanding of oracle bone scripts through cross-modal knowledge fusion. 2024. Preprint at arXiv: 2411.17837.

[42]

Liao W , Liu Z , Zhang Y , Huang X , Liu N , Liu T , et al. Zero-shot relation triplet extraction as next-sentence prediction. Knowl Base Syst. 2024; 304: 112507.

[43]

Zhang L , Liu Z , Zhang L , Wu Z , Yu X , Holmes J , et al. Generalizable and promptable artificial intelligence model to augment clinical delineation in radiation oncology. Medical Physics. 2024; 51 (3): 2187- 99.

[44]

Tan C , Cao Q , Li Y , Zhang J , Yang X , Zhao H , et al. On the promises and challenges of multimodal foundation models for geographical, environmental, agricultural, and urban planning applications. 2023. Preprint at arXiv: 2312.17016.

[45]

Shi Y , Peng S , Liu Z , Wu Z , Li Q , Xiang L . MGH radiology LLaMA: a LLaMA 3 70b model for radiology. 2024. Preprint at arXiv: 2408.11848.

[46]

Peng S , Chen J , Liu Z , Wang H , Wu Z , Zhong T , et al. Transcending language boundaries:harnessing LLMs for low-resource language translation. 2024. Preprint at arXiv: 2411.11295.

[47]

Wang J , Zhao H , Yang Z , Peng S , Chen J , Sun H , et al. Legal evalutions and challenges of large language models. 2024. Preprint at arXiv: 2411.10137.

[48]

Holmes J , Zhang L , Ding Y , Feng H , Liu Z , Liu T , et al. Benchmarking a foundation large language model on its ability to relabel structure names in accordance with the American Association of Physicists in Medicine Task Group-263 report. Practical Radiation Oncology. 2024; 14 (6): e515- 21.

[49]

Lin Z , Wu Z , Dai H , Liu Z , Zhang T , Zhu D , et al. Embedding human brain function via transformer. In:International conference on medical image computing and computer-assisted intervention. Springer. 2022. p. 366- 75.

[50]

Lin Z , Wu Z , Dai H , Liu Z , Hu X , Zhang T , et al. A generic framework for embedding human brain function with temporally correlated autoencoder. Med Image Anal. 2023; 89: 102892.

[51]

Liu Z , Li Y , Zolotarevych O , Yang R , Liu T . LLM-POTUS score: a framework of analyzing presidential debates with large language models. 2024. Preprint at arXiv: 2409.08147.

[52]

Yang Z , Lin X , He Q , Huang Z , Liu Z , Jiang H , et al. Examining the commitments and difficulties inherent in multimodal foundation models for street view imagery. 2024. Preprint at arXiv: 2408.12821.

[53]

Gong X , Zhang J , Qi G , Teng Y , Hou J , Lyu Y , et al. Advancing microbial production through artificial intelligence-aided biology. Biotechnol Adv. 2024; 74: 108399.

[54]

Mukherjee S , Gamble P , Sanz Ausin M , Kant N , Aggarwal K , Manjunath N , et al. Polaris: a safety-focused LLM constellation architecture for healthcare. 2024. Preprint at arXiv: 2403.13313.

[55]

Xu S , Wu Z , Zhao H , Shu P , Liu Z , Liao W , et al. Reasoning before comparison:LLM-enhanced semantic similarity metrics for domain specialized text analysis. 2024. Preprint at arXiv: 2402.11398.

[56]

Latif E , Fang L , Ma P , Zhai X . Knowledge distillation of LLMs for automatic scoring of science assessments. In: International conference on artificial intelligence in education. Springer; 2024. p. 166- 74.

[57]

Liu Z , Holmes J , Liao W , Liu C , Zhang L , Feng H , et al. The radiation oncology NLP database. 2024. Preprint at arXiv: 2401.10995.

[58]

Wei Y , Zhong T , Zhang S , Li X , Zhang T , Lin Z , et al. Chat2Brain: a method for mapping open-ended semantic queries to brain activation maps. In: 2023 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE; 2023. p. 1523- 30.

[59]

Liu Z , Jiang H , Zhong T , Wu Z , Ma C , Li Y , et al. Holistic evaluation of GPT-4V for biomedical imaging. 2023. Preprint at arXiv: 2312.05256.

[60]

Liu Z , He M , Jiang Z , Wu Z , Dai H , Zhang L , et al. Survey on natural language processing in medical image analysis. Zhong nan da xue xue bao. Yi xue ban=Journal of Central South University. Medical Sciences. 2022; 47 (8): 981- 93.

[61]

Wu Z , Zhang L , Cao C , Yu X , Dai H , Ma C , et al. Exploring the trade-offs: unified large language models vs local fine-tuned models for highly-specific radiology NLI task. 2023. Preprint at arXiv: 2304.09138.

[62]

Xiao Z , Chen Y , Zhang L , Yao J , Wu Z , Yu X , et al. Instruction-ViT: multi-modal prompts for instruction learning in ViT. 2023. Preprint at arXiv: 2305.00201.

[63]

Cai H , Huang X , Liu Z , Liao W , Dai H , Wu Z , et al. Exploring multimodal approaches for Alzheimer's disease detection using patient speech transcript and audio data. 2023. Preprint at arXiv: 2307.02514.

[64]

Holmes J , Liu Z , Zhang L , Ding Y , Sio TT , McGee LA , et al. Evaluating large language models on a highly-specialized topic, radiation oncology physics. 2023. Preprint at arXiv: 2304.01938.

[65]

Liu Z , Wu Z , Hu M , Zhao B , Lin Z , Zhang T , et al. PharmacyGPT: the AI pharmacist. 2023. Preprint at arXiv: 2307.10432.

[66]

Guan Z , Wu Z , Liu Z , Wu D , Ren H , Li Q , et al. CohortGPT: an enhanced GPT for participant recruitment in clinical study. 2023. Preprint at arXiv: 2307.11346.

[67]

Liu Z , Zhong T , Li Y , Zhang Y , Pan Y , Zhao Z , et al. Evaluating large language models for radiology natural language processing. 2023. Preprint at arXiv: 2307.13693.

[68]

Cai H , Huang X , Liu Z , Liao W , Dai H , Wu Z , et al. Multimodal approaches for Alzheimer's detection using patients' speech and transcript. In: International conference on brain informatics. Springer; 2023. p. 395- 406.

[69]

Shi Y , Xu S , Liu Z , Liu T , Xiang L , Liu N . MedEdit: model editing for medical question answering with external knowledge bases. 2023. Preprint at arXiv: 2309.16035.

[70]

Tang C , Liu Z , Ma C , Wu Z , Li Y , Liu W , et al. PolicyGPT: automated analysis of privacy policies with large language models. 2023. Preprint at arXiv: 2309.10238.

[71]

Liu Z , Li Y , Cao Q , Chen J , Yang T , Wu Z , et al. Transformation vs tradition: artificial general intelligence (AGI) for arts and humanities. 2023. Preprint at arXiv: 2310.19626.

[72]

Zhong T , Zhao W , Zhang Y , Pan Y , Dong P , Jiang Z , et al. ChatRadio-valuer: a chat large language model for generalizable radiology report generation based on multi-institution and multi-system data. 2023. Preprint at arXiv: 2310.05242.

[73]

Gong X , Holmes J , Li Y , Liu Z , Gan Q , Wu Z , et al. Evaluating the potential of leading large language models in reasoning biology questions. 2023. Preprint at arXiv: 2311.07582.

[74]

Liao W , Liu Z , Zhang Y , Huang X , Qi F , Ding S , et al. Coarse-to-fine knowledge graph domain adaptation based on distantly-supervised iterative training. In: 2023 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE; 2023. p. 1294- 9.

[75]

Holmes J , Peng R , Li Y , Hu J , Liu Z , Wu Z , et al. Evaluating multiple large language models in pediatric ophthalmology. 2023. Preprint at arXiv: 2311.04368.

[76]

Rezayi S , Liu Z , Wu Z , Chandra D , Ge B , Dai H , et al. Exploring new frontiers in agricultural NLP:investigating the potential of large language models for food applications. IEEE Transactions on Big Data. 2024; 11 (3): 1235- 46.

[77]

Dou F , Ye J , Geng Y , Lu Q , Niu W , Sun H , et al. Towards artificial general intelligence (AGI) in the internet of things (IoT): opportunities and challenges. 2023. Preprint at arXiv: 2309.07438.

[78]

Holmes J , Zhang L , Ding Y , Feng H , Liu Z , Liu T , et al. Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report. 2023. Preprint at arXiv: 2310.03874.

[79]

Sennrich R . Neural machine translation of rare words with subword units. 2015. Preprint at arXiv:1508.07909.

[80]

Schuster M , Nakajima K . Japanese and Korean voice search. In:2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE; 2012. p. 5149- 52.

[81]

Kudo T . Subword regularization:improving neural network translation models with multiple subword candidates. 2018. Preprint at arXiv:1804.10959.

[82]

Bengio Y , Courville A , Vincent P . Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013; 35 (8): 1798- 828.

[83]

Vaswani A , Shazeer N , Parmar N , Uszkoreit J , Jones L , Gomez AN , et al. Attention is all you need. Adv Neural Inf Process Syst. 2017; 30.

[84]

Lin Z , Feng M , dos Santos CN , Yu M , Xiang B , Zhou B , et al. A structured self-attentive sentence embedding. 2017. Preprint at arXiv:1703.03130.

[85]

Liu Z , Alavi A , Li M , Zhang X . Self-supervised contrastive learning for medical time series: a systematic review. Sensors. 2023; 23 (9): 4221.

[86]

Hastie T , Tibshirani R , Friedman J , Hastie T , Tibshirani R , Friedman J . Overview of supervised learning. In: The elements of statistical learning: data mining, inference, and prediction; 2009. p. 9- 41.

[87]

Geoffrey EH , Salakhutdinov RR . Reducing the dimensionality of data with neural networks. Science. 2006; 313 (5786): 504- 7.

[88]

Kramer MA . Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991; 37 (2): 233- 43.

[89]

Vincent P , Larochelle H , Bengio Y . Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning; 2008; p. 1096- 103.

[90]

Vincent P , Larochelle H , Lajoie I , Bengio Y , Manzagol P-A , Bottou L . Stacked denoising autoencoders:learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010; 11 (12): 3371- 408.

[91]

Petroni F , Rocktäschel T , Lewis P , Bakhtin A , Wu Y , Miller AH , et al. Language models as knowledge bases? 2019. Preprint at arXiv:1909.01066.

[92]

Howard J , Ruder S . Universal language model fine-tuning for text classification. 2018. Preprint at arXiv:1801.06146.

[93]

Raffel C , Shazeer N , Roberts A , Lee K , Narang S , Matena M , et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020; 21 (1): 5485- 551.

[94]

Han X , Zhang Z , Ding N , Gu Y , Liu X , Huo Y , et al. Pre-trained models:past, present and future. AI Open. 2021; 2: 225- 50.

[95]

Koumakis L . Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J. 2020; 18: 1466- 73.

[96]

Watson JD , Crick FH . On protein synthesis. In:The symposia of the society for experimental biology, 12; 1958. p. 138- 63.

[97]

Schlitt T , Palin K , Rung J , Dietmann S , Lappe M , Ukkonen E , et al. From gene networks to gene function. Genome Res. 2003; 13 (12): 2568- 76.

[98]

Kim CY , Baek S , Cha J , Yang S , Kim E , Marcotte EM , et al. HumanNet v3: an improved database of human gene networks for disease research. Nucleic Acids Res. 2022; 50 (D1): D632- 9.

[99]

Riad ABMKI , Abdul Barek M , Rahman MM , Akter MS , Islam T , Rahman MA , et al. Enhancing HIPAA compliance in AI-driven mHealth devices security and privacy. In: 2024 IEEE 48th annual computers, software, and applications conference (COMPSAC). IEEE; 2024. p. 2430- 5.

[100]

Bartels M . A balancing act:data protection compliance of artificial intelligence. GRUR Int. 2024; 73 (6): 526- 37.

[101]

Ofer D , Brandes N , Linial M . The language of proteins:NLP, machine learning & protein sequences. Comput Struct Biotechnol J. 2021; 19: 1750- 8.

[102]

Lin Z , Akin H , Rao R , Hie B , Zhu Z , Lu W , et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023; 379 (6637): 1123- 30.

[103]

Hayes T , Rao R , Akin H , Sofroniew NJ , Oktay D , Lin Z , et al. Simulating 500 million years of evolution with a language model. 2024. Preprint at bioRxiv: 2024.07.01.600583.

[104]

Lynch VJ , Leclerc RD , May G , Wagner GP . Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nat Genet. 2011; 43 (11): 1154- 9.

[105]

Gene Ontology Consortium . The gene ontology (GO) database and informatics resource. Nucleic Acids Res. 2004; 32 (Suppl l_1): D258- 61.

[106]

Ali A , Zhang J . Optimizing large language models:performance, efficiency, and scalability. East Eur J Multidiscip Res. 2024; 3 (2): 13- 9.

[107]

Rostam ZRK , Szénási S , Kertész G . Achieving peak performance for large language models: a systematic review. IEEE Access. 2024; 12: 96017- 50.

[108]

Javed H , El-Sappagh S , Abuhmed T . Robustness in deep learning models for medical diagnostics:security and adversarial challenges towards robust AI applications. Artif Intell Rev. 2025; 58 (1): 1- 107.

[109]

Naresh G , Thangavelu P . Integrating machine learning for health prediction and control in over-discharged li-nmc battery systems. Ionics. 2024; 30 (12): 8015- 32.

[110]

Wu T , Luo L , Li Y-F , Pan S , Vu T-T , Haffari G . Continual learning for large language models: a survey. 2024. Preprint at arXiv: 2402.01364.

[111]

Dohare S , Hernandez-Garcia JF , Lan Q , Rahman P , Mahmood AR , Sutton RS . Loss of plasticity in deep continual learning. Nature. 2024; 632 (8026): 768- 74.

[112]

Bansal S , Sindhi V , Singla BS . Exploration of deep learning and transfer learning techniques in bioinformatics. In:Applying machine learning techniques to bioinformatics: few-shot and zero-shot methods. IGI Global; 2024. p. 238- 57.

[113]

Mishra SK , Singh A , Dubey KB , Kumar Paul P , Singh V . Role of bioinformatics in data mining and big data analysis. In:Advances in bioinformatics. Springer. 2024. p. 271- 7.

[114]

Yan B , Li K , Xu M , Dong Y , Zhang L , Ren Z , et al. On protecting the data privacy of large language models (LLMs): a survey. 2024. Preprint at arXiv: 2403.05156.

[115]

Zheng JY , Zhang HN , Wang LX , Qiu WJ , Zheng HW , Zheng ZM . Safely learning with private data: a federated learning framework for large language model. 2024. Preprint at arXiv: 2406.14898.

[116]

Ye R , Wang W , Chai J , Li D , Li Z , Xu Y , et al. OpenFedLLM: training large language models on decentralized private data via federated learning. In: Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining; 2024. p. 6137- 47.

[117]

Mao W , Hou C , Zhang T , Lin X , Tang K , Lv H . Parse trees guided LLM prompt compression. 2024. Preprint at arXiv: 2409.15395.

[118]

Cai X , Wang C , Long Q , Zhou Y , Xiao M . Knowledge hierarchy guided biological-medical dataset distillation for domain LLM training. 2025. Preprint at arXiv: 2501.15108.

[119]

Zhao H , Ma C , Xu FZ , Kong L , Deng Z-H . Biomaze: benchmarking and enhancing large language models for biological pathway reasoning. 2025. Preprint at arXiv: 2502.16660.

[120]

Samek W , Montavon G , Lapuschkin S , Anders CJ , Müller K-R . Explaining deep neural networks and beyond: a review of methods and applications. Proc IEEE. 2021; 109 (3): 247- 78.

[121]

Jia D , Dong W , Socher R , Li L-J , Li K , Fei-Fei L . ImageNet: a large-scale hierarchical image database. In:2009 IEEE conference on computer vision and pattern recognition. IEEE; 2009. p. 248- 55.

[122]

Pourpanah F , Abdar M , Luo Y , Zhou X , Wang R , Lim CP , et al. A review of generalized zero-shot learning methods. IEEE Trans Pattern Anal Mach Intell. 2022; 45 (4): 4051- 70.

[123]

Wang Y , Yao Q , Kwok JT , Ni LM . Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv. 2020; 53 (3): 1- 34.

[124]

Navigli R , Conia S , Ross B . Biases in large language models: origins, inventory, and discussion. ACM J Data Inf Qual. 2023; 15 (2): 1- 21.

[125]

Li H , Zhu C , Zhang Y , Sun Y , Shui Z , Kuang W , et al. Task-specific fine-tuning via variational information bottleneck for weakly-supervised pathology whole slide image classification. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2023. p. 7454- 63.

[126]

Zheng J , Hong H , Wang X , Su J , Liang Y , Wu S . Fine-tuning large language models for domain-specific machine translation. 2024. Preprint at arXiv: 2402.15061.

[127]

Liu H , Li C , Wu Q , Lee YJ . Visual instruction tuning. Adv Neural Inf Process Syst. 2024; 36: 34892- 916.

[128]

Ding N , Qin Y , Yang G , Wei F , Yang Z , Su Y , et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat Mach Intell. 2023; 5 (3): 220- 35.

[129]

Hu EJ , Shen Y , Wallis P , Allen-Zhu Z , Li Y , Wang S , et al. Lora: low-rank adaptation of large language models. 2021. Preprint at arXiv: 2106.09685.

[130]

He R , Liu L , Ye H , Tan Q , Ding B , Cheng L , et al. On the effectiveness of adapter-based tuning for pretrained language model adaptation. 2021. Preprint at arXiv: 2106.03164.

[131]

Paul FC , Leike J , Brown T , Martic M , Legg S , Amodei D . Deep reinforcement learning from human preferences. Adv Neural Inf Process Syst. 2017; 30.

[132]

Schulman J , Wolski F , Dhariwal P , Radford A , Klimov O . Proximal policy optimization algorithms. 2017. Preprint at arXiv:1707.06347.

[133]

Rafailov R , Sharma A , Mitchell E , Manning CD , Ermon S , Finn C . Direct preference optimization:your language model is secretly a reward model. Adv Neural Inf Process Syst. 2024; 36: 53728- 41.

[134]

Amodei D , Olah C , Jacob S , Christiano P , Schulman J , Mané D . Concrete problems in AI safety. 2016. Preprint at arXiv:1606.06565.

[135]

Pan A , Jones E , Jagadeesan M , Steinhardt J . Feedback loops with language models drive in-context reward hacking. 2024. Preprint at arXiv: 2402.06627.

[136]

Hinton G . Distilling the knowledge in a neural network. 2015. Preprint at arXiv:1503.02531.

[137]

Xu X , Li M , Tao C , Shen T , Cheng R , Li J , et al. A survey on knowledge distillation of large language models. 2024. Preprint at arXiv: 2402.13116.

[138]

Zhang H , Chen J , Jiang F , Yu F , Chen Z , Li J , et al. HuatuoGPT, towards taming language model to be a doctor. 2023. Preprint at arXiv: 2305.15075.

[139]

Taori R , Gulrajani I , Zhang T , Dubois Y , Li X , Guestrin C , et al. Stanford Alpaca: an instruction-following LLaMA model. 2023;

[140]

Hsieh C-Y , Li C-L , Yeh C-K , Nakhost H , Fujii Y , Ratner A , et al. Distilling step-by-step! Outperforming larger language models with less training data and smaller model sizes. 2023. Preprint at arXiv: 2305.02301.

[141]

Liu A , Feng B , Xue B , Wang B , Wu B , Lu C , et al. DeepSeek-v3 technical report. 2024. Preprint at arXiv: 2412.19437.

[142]

Firoozi R , Tucker J , Tian S , Majumdar A , Sun J , Liu W , et al. Foundation models in robotics: applications, challenges, and the future. Int J Robot Res. 2023; 44 (5): 701- 39.

[143]

Liu J , Zhang C , Guo J , Zhang Y , Que H , Deng K , et al. DDK: distilling domain knowledge for efficient large language models. 2024. Preprint at arXiv: 2407.16154.

[144]

Fang L , Chen Y , Zhong W , Ma P . Bayesian knowledge distillation: a Bayesian perspective of distillation with uncertainty quantification. In: Forty-first international conference on machine learning.

[145]

Korattikara A , Rathod V , Murphy K , Welling M . Bayesian dark knowledge. 2015. Preprint at arXiv:1506.04416.

[146]

Touvron H , Lavril T , Izacard G , Martinet X , Jegou H , Grave E , et al. The LLaMA 3 herd of models. 2024. Preprint at arXiv: 2407.21783.

[147]

Xu S , Zhou Y , Liu Z , Wu Z , Zhong T , Zhao H , et al. Towards next-generation medical agent:how o1 is reshaping decision-making in medical scenarios. 2024. Preprint at arXiv: 2411.14461.

[148]

Tian S , Jin Q , Yeganova L , Lai P-T , Zhu Q , Chen X , et al. Opportunities and challenges for chatGPT and large language models in biomedicine and health. Briefings Bioinf. 2024; 25 (1): bbad493.

[149]

Liu Z , Li Y , Han L , Li J , Liu J , Zhao Z , et al. PDB-wide collection of binding data:current status of the PDBbind database. Bioinformatics. 2015; 31 (3): 405- 12.

[150]

Aly Abdelkader G , Kim J-D . Advances in protein-ligand binding affinity prediction via deep learning: a comprehensive study of datasets, data preprocessing techniques, and model architectures. Curr Drug Targets. 2024; 25 (15): 1041- 65.

[151]

Kitts A , Sherry S . The single nucleotide polymorphism database (DBSNP) of nucleotide sequence variation. In: McEntyre J, Ostell JJ, editors. The NCBI handbook. Bethesda: US National Center for Biotechnology Information; 2002.

[152]

Pal A , Kumar Umapathi L , Sankarasubbu M . MedMCQA: a large-scale multi-subject multi-choice dataset for medical domain question answering. In: Conference on health, inference, and learning. PMLR; 2022. p. 248- 60.

[153]

Jin D , Pan E , Oufattole N , Weng W-H , Fang H , Szolovits P . What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl Sci. 2021; 11 (14): 6421.

[154]

George T , Balikas G , Malakasiotis P , Partalas I , Zschunke M , Alvers MR , et al. An overview of the bioASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinf. 2015; 16: 1- 28.

[155]

Nentidis A , Katsimpras G , Krithara A , Paliouras G . Overview of bioASQ tasks 12b and synergy12 in CLEF2024. In: Working notes of CLEF. 2024.

[156]

Jin Q , Dhingra B , Liu Z , Cohen WW , Lu X . PubmedQA: a dataset for biomedical research question answering. 2019. Preprint at arXiv:1909.06146.

[157]

Johnson AEW , Pollard TJ , Greenbaum NR , Lungren MP , Deng C-ying , C-ying Y , et al. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. 2019. Preprint at arXiv:1901.07042.

[158]

Wei C-H , Peng Y , Leaman R , Davis AP , Mattingly CJ , Jiao L , et al. Assessing the state of the art in biomedical relation extraction:overview of the biocreative v chemical-disease relation (CDR) task. Database. 2016; 2016: baw032.

[159]

Islamaj Doğan R , Leaman R , Lu Z . NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inf. 2014; 47: 1- 10.

[160]

Taboureau O , Nielsen SK , Audouze K , Weinhold N , Edsgärd D , Roque FS , et al. ChemProt: a disease chemical biology database. Nucleic Acids Res. 2010; 39 (Suppl l_1): D367- 72.

[161]

Kim Kjærulff S , Wich L , Kringelum J , Jacobsen UP , Kouskoumvekaki I , Audouze K , et al. ChemProt-2.0:visual navigation in a disease chemical biology database. Nucleic Acids Res. 2012; 41 (D1): D464- 9.

[162]

Kringelum J , Kim Kjaerulff S , Brunak S , Lund O , Oprea TI , Taboureau O . ChemProt-3.0: a global chemical biology diseases mapping. Database. 2016; 2016: bav123.

[163]

Herrero-Zazo M , Segura-Bedmar I , Martínez P , Declerck T . The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. J Biomed Inf. 2013; 46 (5): 914- 20.

[164]

Bravo À , Piñero J , Queralt-Rosinach N , Rautschka M , Furlong LI . Extraction of relations between genes and diseases from text and large-scale data analysis:implications for translational research. BMC Bioinf. 2015; 16: 1- 17.

[165]

Smith L , Tanabe LK , Ando RJ , Kuo C-J , Chung I-F , Hsu C-N , et al. Overview of biocreative ii gene mention recognition. Genome Biol. 2008; 9 (S2): 1- 19.

[166]

Collier N , Ohta T , Tsuruoka Y , Tateisi Y , Kim J-D . Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (NLPBA/BioNLP). 2004. p. 73- 8.

[167]

Pustejovsky J , Castano J , Sauri X , Zhang J , Luo W . Medstract:creating large-scale information servers from biomedical texts. In:Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain; 2002. p. 85- 92.

[168]

Gasperin C , Karamanis N , Seal R . Annotation of anaphoric relations in biomedical full-text articles using a domain-relevant scheme. In: Proceedings of DAARC; 2007. Citeseer.

[169]

Su J , Yang X , Hong H , Tateisi Y , Tsujii J . Coreference resolution in biomedical texts: a machine learning approach. Ont Text Min Life Sci. 2008; 8.

[170]

Segura-Bedmar I , Crespo M , de Pablo-Sánchez C , Martínez P . Resolving anaphoras for the extraction of drug-drug interactions in pharmacological documents. BMC Bioinf. 2010; 11 (S2): 1- 9.

[171]

Nguyen N , Kim J-D , Tsujii J . Overview of bioNLP 2011 protein coreference shared task. In: Proceedings of BioNLP shared task 2011 workshop; Nat Biotechnol. 2011. p. 74- 82.

[172]

Theresa Batista-Navarro R , Ananiadou S . Building a coreference-annotated corpus from the domain of biochemistry. In:Proceedings of BioNLP 2011 workshop; 2011. p. 83- 91.

[173]

Bretonnel Cohen K , Lanfranchi A , Joo-young Choi M , Bada M , Baumgartner WA , Panteleyeva N , et al. Coreference annotation and resolution in the Colorado richly annotated full text (CRAFT) corpus of biomedical journal articles. BMC Bioinf. 2017; 18: 1- 14.

[174]

Lee J , Yoon W , Kim S , Kim D , Kim S , So CH , et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020; 36 (4): 1234- 40.

[175]

Joshi M , Chen D , Liu Y , Weld DS , Zettlemoyer L , Levy O . SpanBERT:improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist. 2020; 8: 64- 77.

[176]

Baker S , Silins I , Guo Y , Ali I , Högberg J , Stenius U , et al. Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinformatics. 2016; 32 (3): 432- 40.

[177]

Radford A , Narasimhan K , Salimans T , Sutskever I . Improving language understanding by generative pre-training. OpenAI. 2018.

[178]

Ji Y , Zhou Z , Liu H , Davuluri RV . DNABERT:pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics. 2021; 37 (15): 2112- 20.

[179]

Sanabria M , Hirsch J , Poetsch AR . The human genome's vocabulary as proposed by the DNA language model GROVER. 2023. Preprint at bioRxiv: 2023.07.19.549677.

[180]

Chen K , Zhou Y , Ding M , Wang Y , Ren Z , Yang Y . Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction. 2023. Preprint at bioRxiv: 2023.01.31.526427.

[181]

Chen J , Hu Z , Sun S , Tan Q , Wang Y , Yu Q , et al. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. 2022. Preprint at arXiv: 2204.00300.

[182]

Ahmed E , Heinzinger M , Dallago C , Rihawi G , Wang Y , Jones L , et al. ProtTrans:towards cracking the language of life's code through self-supervised deep learning and high performance computing. 2007. Preprint at arXiv: 2007.06225. arXiv 2020.

[183]

Ferruz N , Schmidt S , Höcker B . ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun. 2022; 13 (1): 4348.

[184]

Bryant P , Pozzati G , Elofsson A . Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun. 2022; 13 (1): 1265.

[185]

Abramson J , Adler J , Dunger J , Evans R , Green T , Pritzel A , et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024; 630 (8016): 1- 3.

[186]

Dalla-Torre H , Gonzalez L , Mendoza-Revilla J , Lopez Carranza N , Henryk Grzywaczewski A , Oteri F , et al. Nucleotide transformer:building and evaluating robust foundation models for human genomics. Nat Methods. 2024; 22 (2): 1- 11.

[187]

Zhou Z , Ji Y , Li W , Dutta P , Davuluri R , Liu H . DNABERT-2:efficient foundation model and benchmark for multi-species genome. 2023. Preprint at arXiv: 2306.15006.

[188]

Nguyen E , Poli M , Faizi M , Thomas A , Wornow M , Birch-Sykes C , et al. HyenaDNA:long-range genomic sequence modeling at single nucleotide resolution. Adv Neural Inf Process Syst. 2024; 36.

[189]

Poli M , Massaroli S , Nguyen E , Fu DY , Dao T , Baccus S , et al. Hyena hierarchy:towards larger convolutional language models. In:International conference on machine learning. PMLR; 2023. p. 28043- 78.

[190]

Zhang D , Zhang W , He B , Zhang J , Qin C , Yao J . DnaGPT: a generalized pretrained tool for multiple DNA sequence analysis tasks. 2023. Preprint at bioRxiv: 2023.07.11.548628.

[191]

Zeng W , Gautam A , Huson DH . Mulan-methyl-multiple transformer-based language models for accurate DNA methylation prediction. GigaScience. 2023; 12: giad054.

[192]

Press O , Smith NA , Lewis M . Train short, test long: attention with linear biases enables input length extrapolation. 2021. Preprint at arXiv: 2108.12409.

[193]

Dao T , Fu D , Ermon S , Rudra A , Flashattention CR . Fast and memory-efficient exact attention with IO-awareness. Adv Neural Inf Process Syst. 2022; 35: 16344- 59.

[194]

Zhou Z , Weimin W , Harrison H , Wang J , Lizhen S , Ramana VD , et al. DNABERT-S:pioneering species differentiation with species-aware DNA embeddings. 2024. Preprint at arXiv: 2402.08777.

[195]

Dreos R , Ambrosini G , Cavin Périer R , Bucher P . EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era. Nucleic Acids Res. 2013; 41 (D1): D157- 64.

[196]

ENCODE Project Consortium . An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489 (7414): 57- 74.

[197]

Moore JE , Purcaro MJ , Pratt HE , Epstein CB , Shoresh N , Adrian J , et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020; 583 (7818): 699- 710.

[198]

Harrow J , Frankish A , Gonzalez JM , Tapanari E , Diekhans M , Kokocinski F , et al. GENCODE:the reference human genome annotation for the encode project. Genome Res. 2012; 22 (9): 1760- 74.

[199]

Kalicki CH , Haritaoglu ED . RNABERT: RNA family classification and secondary structure prediction with BERT pretrained on RNA sequences.

[200]

Chen K , Zhou Y , Ding M , Wang Y , Ren Z , Yang Y . Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Briefings Bioinf. 2024; 25 (3): bbae163.

[201]

Zhang Y , Lang M , Jiang J , Gao Z , Xu F , Litfin T , et al. Multiple sequence alignment-based RNA language model and its application to structural inference. Nucleic Acids Res. 2024; 52 (1): e3.

[202]

Yamada K , Hamada M . Prediction of RNA-protein interactions using a nucleotide language model. Bioinform Adv. 2022; 2 (1): vbac023.

[203]

Wright ES . RNAContest:comparing tools for noncoding RNA multiple sequence alignment based on structural consistency. RNA. 2020; 26 (5): 531- 40.

[204]

Jumper J , Evans R , Pritzel A , Green T , Figurnov M , Ronneberger O , et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596 (7873): 583- 9.

[205]

Sweeney BA , Petrov AI , Ribas CE , Finn RD , Bateman A , Szymanski M , et al. RNAcentral 2021:secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res. 2021; 49 (D1): D212- 20.

[206]

Haeussler M , Zweig AS , Tyner C , Speir ML , Rosenbloom KR , Raney BJ , et al. The UCSC genome browser database:2019 update. Nucleic Acids Res. 2019; 47 (D1): D853- 8.

[207]

Pan X , Fang Y , Li X , Yang Y , Shen H-B . RBPsuite:RNA-protein binding sites prediction suite based on deep learning. BMC Genom. 2020; 21: 1- 8.

[208]

Zhang Q , Fan X , Wang Y , Sun M-an , Shao J , Guo D . BPP: a sequence-based algorithm for branch point prediction. Bioinformatics. 2017; 33 (20): 3166- 72.

[209]

Scalzitti N , Kress A , Orhand R , Weber T , Moulinier L , Jeannin-Girardon A , et al. Spliceator:multi-species splice site prediction using convolutional neural networks. BMC Bioinf. 2021; 22: 1- 26.

[210]

Singh J , Hanson J , Paliwal K , Zhou Y . RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun. 2019; 10 (1): 5407.

[211]

Tanay A , Regev A . Scaling single-cell genomics from phenomenology to mechanism. Nature. 2017; 541 (7637): 331- 8.

[212]

Levine D , Asad Rizvi S , Lévy S , Pallikkavaliyaveetil N , Zhang D , Chen X , et al. Cell2sentence:teaching large language models the language of biology. 2023. Preprint at bioRxiv: 2023.09.11.557287.

[213]

Shen H , Liu J , Hu J , Shen X , Zhang C , Wu D , et al. Generative pretraining from large-scale transcriptomes for single-cell deciphering. iScience. 2023; 26 (5): 106536.

[214]

Theodoris CV , Xiao L , Chopra A , Chaffin MD , Sayed ZRA , Hill MC , et al. Transfer learning enables predictions in network biology. Nature. 2023; 618 (7965): 616- 24.

[215]

Cui H , Wang C , Maan H , Pang K , Luo F , Duan N , et al. scGPT:toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods. 2024; 21 (8): 1- 11.

[216]

Yang F , Wang W , Wang F , Fang Y , Tang D , Huang J , et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell. 2022; 4 (10): 852- 66.

[217]

Du J , Jia P , Dai Y , Tao C , Zhao Z , Zhi D . Gene2vec:distributed representation of genes based on co-expression. BMC Genom. 2019; 20 (S1): 7- 15.

[218]

Xu J , Zhang A , Liu F , Chen L , Zhang X . CIForm as a transformer-based model for cell-type annotation of large-scale single-cell RNA-seq data. Briefings Bioinf. 2023; 24 (4): bbad195.

[219]

Chen J , Xu H , Tao W , Chen Z , Zhao Y , Han J-DJ . Transformer for one stop interpretable cell type annotation. Nat Commun. 2023; 14 (1): 223.

[220]

Jiao L , Wang G , Dai H , Li X , Wang S , Song T . scTransSort:transformers for intelligent annotation of cell types by gene embeddings. Biomolecules. 2023; 13 (4): 611.

[221]

Song T , Dai H , Wang S , Wang G , Zhang X , Zhang Y , et al. Transcluster: a cell-type identification method for single-cell RNA-seq data using deep learning based on transformer. Front Genet. 2022; 13: 1038919.

[222]

Preissl S , Gaulton KJ , Ren B . Characterizing cis-regulatory elements using single-cell epigenomics. Nat Rev Genet. 2023; 24 (1): 21- 43.

[223]

Avsec Ž , Agarwal V , Visentin D , Ledsam JR , Grabska-Barwinska A , Taylor KR , et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods. 2021; 18 (10): 1196- 203.

[224]

Gao Z , Liu Q , Zeng W , Jiang R , Wong WH . EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics. Genome Biol. 2024; 25 (1): 1- 30.

[225]

Maxwell RM , Rubin AJ , Flynn RA , Dai C , Khavari PA , Greenleaf WJ , et al. HiChIP:efficient and sensitive analysis of protein-directed genome architecture. Nat Methods. 2016; 13 (11): 919- 22.

[226]

Belton J-M , McCord RP , Gibcus JH , Naumova N , Zhan Y , Dekker J . Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012; 58 (3): 268- 76.

[227]

Aebersold R , Mann M . Mass spectrometry-based proteomics. Nature. 2003; 422 (6928): 198- 207.

[228]

Steven RS . An introduction to mass spectrometry-based proteomics. J Proteome Res. 2023; 22 (7): 2151- 71.

[229]

Wang F , Liu C , Li J , Yang F , Song J , Zang T , et al. SPDB: a comprehensive resource and knowledgebase for proteomic data at the single-cell resolution. Nucleic Acids Res. 2024; 52 (D1): D562- 71.

[230]

Ding N , Qu S , Xie L , Li Y , Liu Z , Zhang K , et al. Automating exploratory proteomics research via language models. 2024. Preprint at arXiv: 2411.03743.

[231]

Zhang Q , Ding K , Lyv T , Wang X , Yin Q , Zhang Y , et al. Scientific large language models: a survey on biological & chemical domains. 2024. Preprint at arXiv: 2401.14656.

[232]

Xiao H , Zhou F , Liu X , Liu T , Li Z , Liu X , et al. A comprehensive survey of large language models and multimodal large language models in medicine. 2024. Preprint at arXiv: 2405.08603.

[233]

Rives A , Meier J , Sercu T , Goyal S , Lin Z , Liu J , et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci. 2021; 118 (15): e2016239118.

[234]

Meier J , Rao R , Verkuil R , Liu J , Sercu T , Rives A . Language models enable zero-shot prediction of the effects of mutations on protein function. AdvNeural InfProcess Syst. 2021; 34: 29287- 303.

[235]

Brandes N , Ofer D , Peleg Y , Rappoport N , Linial M . ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics. 2022; 38 (8): 2102- 10.

[236]

Ahmed E , Heinzinger M , Dallago C , Rehawi G , Wang Y , Jones L , et al. ProtTrans:toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell. 2021; 44 (10): 7112- 27.

[237]

Ali M , McCann B , Naik N , Keskar NS , Anand N , Eguchi RR , et al. Progen: language modeling for protein generation. 2020. Preprint at arXiv: 2004.03497.

[238]

Ferruz N , Schmidt S , Höcker B . A deep unsupervised language model for protein design. 2022. Preprint at bioRxiv: 2022.03.09.483666v1.

[239]

Munsamy G , Lindner S , Lorenz P , Ferruz N . ZymCTRL: a conditional language model for the controllable generation of artificial enzymes. In:NeurIPS machine learning in structural biology workshop; 2022.

[240]

Hesslow D , Zanichelli N , Notin P , Poli I , Marks D . Rita: a study on scaling up generative protein sequence models. 2022. Preprint at arXiv: 2205.05789.

[241]

Shuai RW , Ruffolo JA , Gray JJ . Generative language modeling for antibody design. 2021. Preprint at bioRxiv: 2021.12.13.472419v2.

[242]

Sternke M , Karpiak J . ProteinRL: reinforcement learning with generative protein language models for property-directed sequence design. In:NeurIPS 2023 generative AI and biology (GenBio) workshop; 2023.

[243]

Truong T, Jr ., Bepler T . Poet: a generative model of protein families as sequences-of-sequences. Adv Neural Inf Process Syst. 2023; 36: 77379- 415.

[244]

Cao Y , Das P , Chenthamarakshan V , Chen P-Y , Melnyk I , Shen Y . Fold2Seq: a joint sequence (1D)-fold (3D) embedding-based generative model for protein design. In: International conference on machine learning. PMLR; 2021. p. 1261- 71.

[245]

Ram S , Bepler T . Few shot protein generation. 2022. Preprint at arXiv: 2204.01168.

[246]

Sgarbossa D , Lupo U , Bitbol A-F . Generative power of a protein language model trained on multiple sequence alignments. eLife. 2023; 12: e79854.

[247]

Lee M , Felipe Vecchietti L , Jung H , Ro H , Cha M , Kim HM . Protein sequence design in a latent space via model-based reinforcement learning. 2023.

[248]

Zheng Z , Deng Y , Xue D , Zhou Y , Ye F , Gu Q . Structure-informed language models are protein designers. In: International Conference on Machine Learning (ICML). ICML; 2023.

[249]

Zhang L , Chen J , Shen T , Li Y , Sun S . Enhancing the protein tertiary structure prediction by multiple sequence alignment generation. 2023. Preprint at arXiv: 2306.01824.

[250]

Heinzinger M , Weissenow K , Gomez Sanchez J , Henkel A , Steinegger M , Rost B . Bilingual language model for protein sequence and structure. NAR Genomics and Bioinformatics. 2024; 6 (4): lqae150.

[251]

Chen B , Cheng X , Li P , Geng Y-ao , Gong J , Li S , et al. xTrimoPGLM: unified 100b-scale pre-trained transformer for deciphering the language of protein. 2024. Preprint at arXiv: 2401.06199.

[252]

Serrano Y , Roda S , Guallar V , Molina A . Efficient and accurate sequence generation with small-scale protein language models. 2023. Preprint at bioRxiv: 2023.08.04.551626v1.

[253]

Simon KSC , Wei KY . Generative antibody design for complementary chain pairing sequences through encoder-decoder language model. 2023. Preprint at arXiv: 2301.02748.

[254]

Lee Y , Yu H , Lee J , Kim J . Pre-training sequence, structure, and surface features for comprehensive protein representation learning. In: The twelfth international conference on learning representations; 2023.

[255]

Nguyen VTD , Son Hy T . Multimodal pretraining for unsupervised protein representation learning. Biol Methods Protoc. 2024; 9 (1): bpae043.

[256]

Lin Z , Akin H , Rao R , Hie B , Zhu Z , Lu W , et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. 2022. Preprint at bioRxiv: 2022.07.20.500902v1.

[257]

Durham J , Zhang J , Humphreys IR , Pei J , Cong Q . Recent advances in predicting and modeling protein-protein interactions. Trends Biochem Sci. 2023; 48 (6): 527- 38.

[258]

AlQuraishi M . AlphaFold at CASP13. Bioinformatics. 2019; 35 (22): 4862- 5.

[259]

Agarwal V , McShan AC . The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat Chem Biol. 2024; 20 (8): 950- 9.

[260]

Jha K , Karmakar S , Saha S . Graph-BERT and language model-based framework for protein-protein interaction identification. Sci Rep. 2023; 13 (1): 5663.

[261]

Li X , Han P , Chen W , Gao C , Wang S , Song T , et al. MARPPI:boosting prediction of protein-protein interactions with multi-scale architecture residual network. Briefings Bioinf. 2023; 24 (1): bbac524.

[262]

Lee JM , Hammarén HM , Savitski MM , Baek SH . Control of protein stability by post-translational modifications. Nat Commun. 2023; 14 (1): 201.

[263]

Shrestha P , Kandel J , Tayara H , Chong KT . Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model. Nat Commun. 2024; 15 (1): 6699.

[264]

Esmaili F , Pourmirzaei M , Ramazi S , Shojaeilangari S , Yavari E . A review of machine learning and algorithmic methods for protein phosphorylation site prediction. Genom Proteom Bioinform. 2023; 21 (6): 1266- 85.

[265]

Bertoline LMF , Lima AN , Krieger JE , Teixeira SK . Before and after AlphaFold2: an overview of protein structure prediction. Front Bioinform. 2023; 3: 1120370.

[266]

Kim G , Lee S , Levy Karin E , Kim H , Moriwaki Y , Ovchinnikov S , et al. Easy and accurate protein structure prediction using colabFold. Nat Protoc. 2024; 20 (3): 1- 23.

[267]

Jing B , Erives E , Pao-Huang P , Corso G , Berger B , Jaakkola T . EigenFold:generative protein structure prediction with diffusion models. 2023. Preprint at arXiv: 2304.02198.

[268]

Bateman A , Martin MJ , Orchard S , Magrane M , Ahmad S , Alpi E , et al. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023; 51 (D1): D523- 31.

[269]

Nguyen T , Wriggers W , He J . A data set of paired structural segments between protein data bank and AlphaFold DB for medium-resolution cryo-em density maps: a gap in overall structural quality. In:International symposium on bioinformatics research and applications. Springer. 2024. p. 52- 63.

[270]

Lee GH , Min CW , Jang JW , Gupta R , Kim ST . Dataset on post-translational modifications proteome analysis of msp1-overexpressing rice leaf proteins. Data Brief. 2023; 50: 109573.

[271]

Polina L , Weindl D . Dynamic models for metabolomics data integration. Curr Opin Syst Biol. 2021; 28: 100358.

[272]

Tian L , Yu T . An integrated deep learning framework for the interpretation of untargeted metabolomics data. Briefings Bioinf. 2023; 24 (4): bbad244.

[273]

Kaddour J , Harris J , Mozes M , Bradley H , Raileanu R , McHardy R . Challenges and applications of large language models. 2023. Preprint at arXiv: 2307.10169.

[274]

Vu T , Siemek P , Bhinderwala F , Xu Y , Powers R . Evaluation of multivariate classification models for analyzing NMR metabolomics data. J Proteome Res. 2019; 18 (9): 3282- 94.

[275]

Mao C , Xu J , Rasmussen L , Li Y , Adekkanattu P , Pacheco J , et al. AD-BERT:using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease. J Biomed Inf. 2023; 144: 104442.

[276]

Feng Y , Xu X , Zhuang Y , Zhang M . Large language models improve Alzheimer's disease diagnosis using multi-modality data. In: 2023 IEEE international conference on medical artificial intelligence (MedAI). IEEE; 2023. p. 61- 6.

[277]

Xie K , Gallagher RS , Conrad EC , Garrick CO , Baldassano SN , Bernabei JM , et al. Extracting seizure frequency from epilepsy clinic notes: a machine reading approach to natural language processing. J Am Med Inf Assoc. 2022; 29 (5): 873- 81.

[278]

Koga S , Martin NB , Dickson DW . Evaluating the performance of large language models:ChatGPT and Google bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol. 2024; 34 (3): e13207.

[279]

Le Guellec B , Lefèvre A , Geay C , Shorten L , Bruge C , Hacein-Bey L , et al. Performance of an open-source large language model in extracting information from free-text radiology reports. Radiology:Artif Intell. 2024; 6 (4): e230364.

[280]

Kanzawa J , Yasaka K , Fujita N , Fujiwara S , Abe O . Automated classification of brain MRI reports using fine-tuned large language models. Neuroradiology. 2024; 66 (12): 1- 7.

[281]

Valsaraj A , Madala I , Garg N , Baths V . Alzheimer's dementia detection using acoustic & linguistic features and pre-trained BERT. In:2021 8th international conference on soft computing & machine intelligence (ISCMI). IEEE. 2021. p. 171- 5.

[282]

Anand Vats N , Yadavalli A , Gurugubelli K , Kumar Vuppala A . Acoustic features, BERT model and their complementary nature for Alzheimer's dementia detection. In:Proceedings of the 2021 thirteenth international conference on contemporary computing. 2021. p. 267- 72.

[283]

Bang J-U , Han S-H , Kang B-O . Alzheimer's disease recognition from spontaneous speech using large language models. ETRI J. 2024; 46 (1): 96- 105.

[284]

Agbavor F , Liang H . Predicting dementia from spontaneous speech using large language models. PLoS Digit Health. 2022; 1 (12): e0000168.

[285]

Cong Y , LaCroix AN , Lee J . Clinical efficacy of pre-trained large language models through the lens of aphasia. Sci Rep. 2024; 14 (1): 15573.

[286]

Van Veen D , Van Uden C , Blankemeier L , Delbrouck J-B , Aali A , Bluethgen C , et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat Med. 2024; 30 (4): 1134- 42.

[287]

Lee J-H , Choi E , McDougal R , Lytton WW . GPT-4 performance for neurologic localization. Neurol Clin Pract. 2024; 14 (3): e200293.

[288]

Kwon T , Tzu-iunn Ong K , Kang D , Moon S , Lee JR , Hwang D , et al. Large language models are clinical reasoners:reasoning-aware diagnosis framework with prompt-generated rationales. Proc AAAI Conf Artif Intell. 2024; 38 (16): 18417- 25.

[289]

Akiyama M , Sakakibara Y . Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom Bioinform. 2022; 4 (1): lqac012.

[290]

Xu M , Yuan X , Miret S , Tang J . ProtST: multi-modality learning of protein sequences and biomedical texts. In: International conference on machine learning. PMLR; 2023. p. 38749- 67.

[291]

Liu Q , Zeng W , Zhu H , Li L , Wong WH , Alzheimer's Disease Neuroimaging Initiative . Leveraging genomic large language models to enhance causal genotype-brain-clinical pathways in Alzheimer's disease. 2024. Preprint at medRxiv 2024.10.03. 24314824.

[292]

Frank M , Ni P , Jensen M , Gerstein MB . Leveraging a large language model to predict protein phase transition: a physical, multiscale, and interpretable approach. Proc Natl Acad Sci. 2024; 121 (33): e2320510121.

[293]

Kim JW , Ahmed A , Bernardo D . EEG-GPT:exploring capabilities of large language models for EEG classification and interpretation. 2024. Preprint at arXiv: 2401.18006.

[294]

Liu M , Song Z , Chen D , Wang X , Zhuang Z , Fei M , et al. Affinity learning based brain function representation for disease diagnosis. In:International conference on medical image computing and computer-assisted intervention. Springer; 2024. p. 14- 23.

[295]

Ali B , Hashemi F . Brain-mamba: encoding brain activity via selective state space models. In:Conference on health, inference, and learning. PMLR; 2024. p. 233- 50.

[296]

Jan Holwerda T , Deeg DJH , Beekman ATF , Van Tilburg TG , Stek ML , Jonker C , et al. Feelings of loneliness, but not social isolation, predict dementia onset:results from the Amsterdam Study of the Elderly (AMSTEL). J Neurol Neurosurg Psychiatr. 2014; 85 (2): 135- 42.

[297]

Xiang Q . ChatGPT: a promising tool to combat social isolation and loneliness in older adults with mild cognitive impairment. Neurol Live. 2023: NA.

[298]

Raile P . The usefulness of chatGPT for psychotherapists and patients. Humanit Soc Sci Commun. 2024; 11 (1): 1- 8.

[299]

Ali Mohammed I , Venkataraman S . An innovative study for the development of a wearable AI device to monitor Parkinson's disease using generative AI and LLM techniques. Int J Creat Res Thoughts. 2023: 2320- 882.

[300]

Binta Manir S , Islam KMS , Madiraju P , Deshpande P . LLM-based text prediction and question answer models for aphasia speech. IEEE Access. 2024; 12: 114670- 80.

[301]

Liberati G , Rocha JLDD , Van der Heiden L , Raffone A , Birbaumer N , Belardinelli MO , et al. Toward a brain-computer interface for Alzheimer's disease patients by combining classical conditioning and brain state classification. J Alzheim Dis. 2012; 31 (s3): S211- 20.

[302]

Miladinović A , Ajčević M , Busan P , Jarmolowska J , Silveri G , Deodato M , et al. Evaluation of motor imagery-based BCI methods in neurorehabilitation of Parkinson's disease patients. In:2020 42nd annual international conference of the IEEE engineering in medicine & biology society (EMBC). IEEE; 2020. p. 3058- 61.

[303]

Li Z , Zhao S , Duan J , Su C-Y , Yang C , Zhao X . Human cooperative wheelchair with brain-machine interaction based on shared control strategy. IEEE ASME Trans Mechatron. 2016; 22 (1): 185- 95.

[304]

Cao Z . A review of artificial intelligence for EEG-based brain-computer interfaces and applications. Brain Sci Adv. 2020; 6 (3): 162- 70.

[305]

Sorino P , Biancofiore GM , Lofù D , Colafiglio T , Lombardi A , Narducci F , et al. ARIEL: brain-computer interfaces meet large language models for emotional support conversation. In: Adjunct proceedings of the 32nd ACM conference on user modeling, adaptation and personalization; 2024. p. 601- 9.

[306]

Jiménez Benetó DM . Arithmetic reasoning in large language models and a speech brain-computer interface. B.S. thesis. Universitat Politècnica de Catalunya; 2024.

[307]

Reza Saeidnia H , Kozak M , Brady DL , Hassanzadeh M . Evaluation of chatGPT's responses to information needs and information seeking of dementia patients. Sci Rep. 2024; 14 (1): 10273.

[308]

Hristidis V , Ruggiano N , Brown EL , Ganta SRR , Stewart S . ChatGPT vs Google for queries related to dementia and other cognitive decline:comparison of results. J Med Internet Res. 2023; 25: e48966.

[309]

Wu C , Lin W , Zhang X , Zhang Y , Xie W , Wang Y . PMC-LLaMA:toward building open-source language models for medicine. J Am Med Inf Assoc. 2024; 31 (9): 1833- 43.

[310]

Oh Y , Park S , Byun HK , Cho Y , Lee IJ , Kim JS , et al. LLM-driven multimodal target volume contouring in radiation oncology. Nat Commun. 2024; 15 (1): 9186.

[311]

Oh Y , Park S , Xiang L , Yi W , Paly J , Efstathiou J , et al. Mixture of multicenter experts in multimodal generative AI for advanced radiotherapy target delineation. 2024. Preprint at arXiv: 2410.00046.

[312]

Wang P , Liu Z , Li Y , Holmes J , Shu P , Zhang L , et al. Fine-tuning large language models for radiation oncology, a specialized health care domain. Int J Radiat Oncol Biol Phys. 2024; 120 (2): e664.

[313]

Oh Y , Park S , Byun HK , Cho Y , Lee IJ , Kim JS , et al. LLM-driven multimodal target volume contouring in radiation oncology. Nat Commun. 2024; 15 (1): 9186.

[314]

Dong Z , Chen Y , Gay H , Yao H , Hugo GD , Samson P , et al. Large-language-model empowered 3d dose prediction for intensity-modulated radiotherapy. Med Phys. 2024; 52 (1): 619- 32.

[315]

Yuexing H , Holmes JM , Hobson J , Bennett A , Ebner DK , Routman DM , et al. Retrospective comparative analysis of prostate cancer in-basket messages:responses from closed-domain LLM vs. clinical teams. 2024. Preprint at arXiv: 2409.18290.

[316]

Wang P , Holmes J , Liu Z , Chen D , Liu T , Shen J , et al. A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options. Front Oncol. 2025; 15: 1557064.

[317]

Trtica-Majnaric L , Zekic-Susac M , Sarlija N , Vitale B . Prediction of influenza vaccination outcome by neural networks and logistic regression. J Biomed Inf. 2010; 43 (5): 774- 81.

[318]

Alhasan K , Al-Tawfiq J , Aljamaan F , Jamal A , Al-Eyadhy A , Temsah M-H . Mitigating the burden of severe pediatric respiratory viruses in the post-Covid-19 era:ChatGPT insights and recommendations. Cureus. 2023; 15 (3): e36263.

[319]

Hung S-K , Wu C-C , Singh A , Li J-H , Lee C , Chou EH , et al. Developing and validating clinical features-based machine learning algorithms to predict influenza infection in influenza-like illness patients. Biomed J. 2023; 46 (5): 100561.

[320]

Subba B , Toufiq M , Omi F , Yurieva M , Khan T , Rinchai D , et al. Large language model-driven selection of glutathione peroxidase 4 as a candidate blood transcriptional biomarker for circulating erythroid cells. 2024.

[321]

Du J , Xiang Y , Sankaranarayanapillai M , Zhang M , Wang J , Si Y , et al. Extracting postmarketing adverse events from safety reports in the vaccine adverse event reporting system (VAERS) using deep learning. J Am Med Inf Assoc. 2021; 28 (7): 1393- 400.

[322]

Shah SAW , Palomar DP , Barr I , Poon LLM , Ahmed AQ , McKay MR . Seasonal antigenic prediction of influenza A H3N2 using machine learning. Nat Commun. 2024; 15 (1): 3833.

[323]

Wu H , Li M , Zhang L . Comparing physician and large language model responses to influenza patient questions in the online health community. Int J Med Inf. 2025; 197: 105836.

[324]

Huang X , Smith MC , Jamison AM , Broniatowski DA , Dredze M , Quinn SC , et al. Can online self-reports assist in real-time identification of influenza vaccination uptake? A cross-sectional study of influenza vaccine-related tweets in the USA, 2013-2017. BMJ Open. 2019; 9 (1): e024018.

[325]

Li Y , Li J , He J , Tao C . AE-GPT:using large language models to extract adverse events from surveillance reports-a use case with influenza vaccine adverse events. PLoS One. 2024; 19 (3): e0300919.

[326]

Mohamed Ghazy R , Elkhadry SW , Abdel-Rahman S , Taha SHN , Youssef N , Elshabrawy A , et al. External validation of the parental attitude about childhood vaccination scale. Front Public Health. 2023; 11: 1146792.

[327]

Sammut F , Suda D , Caruana MA , Bogolyubova O . COVID-19 vaccination attitudes across the European continent. Heliyon. 2023; 9 (8): e18903.

[328]

Skyles TJ , Stevens HP , Davis SC , Obray AM , Miner DS , East MJ , et al. Comparison of predictive factors of flu vaccine uptake pre-and post-COVID-19 using the NIS-teen survey. Vaccines. 2024; 12 (10): 1164.

[329]

Ahmad ST , Lu H , Liu S , Lau A , Amin B , Dras M , et al. VaxGuard: a multi-generator, multi-type, and multi-role dataset for detecting LLM-generated vaccine misinformation. 2025. Preprint at arXiv: 2503.09103.

[330]

Sun VH , Heemelaar JC , Hadzic I , Raghu VK , Wu C-Y , Zubiri L , et al. Enhancing precision in detecting severe immune-related adverse events:comparative analysis of large language models and international classification of disease codes in patient records. J Clin Oncol. 2024; 42 (35): 4134- 44.

[331]

Boubnovski Martell M , Märtens K , Phillips L , Keitley D , Dermit M , Fauqueur J . A scalable LLM framework for therapeutic biomarker discovery:grounding Q/A generation in knowledge graphs and literature. In:ICLR 2025 workshop on machine learning for genomics explorations.

[332]

McIlwain DR , Chen H , Rahil Z , Bidoki NH , Jiang S , Bjornson Z , et al. Human influenza virus challenge identifies cellular correlates of protection for oral vaccination. Cell Host Microbe. 2021; 29 (12): 1828- 37.e5.

[333]

Hayati M , Sobkowiak B , Stockdale JE , Colijn C . Phylo-genetic identification of influenza virus candidates for seasonal vaccines. Sci Adv. 2023; 9 (44): eabp9185.

[334]

Gao C , Wen F , Guan M , Hatuwal B , Li L , Praena B , et al. MAIVeSS:streamlined selection of antigenically matched, high-yield viruses for seasonal influenza vaccine production. Nat Commun. 2024; 15 (1): 1128.

[335]

Montin D , Santilli V , Beni A , Costagliola G , Martire B , Mastrototaro MF , et al. Towards personalized vaccines. Front Immunol. 2024; 15: 1436108.

[336]

Lee EK , Tian H , Nakaya HI . Antigenicity prediction and vaccine recommendation of human influenza virus A (H3N2) using convolutional neural networks. Hum Vaccines Immunother. 2020; 16 (11): 2690- 708.

[337]

Meaney C , Escobar M , Stukel TA , Austin PC , Jaakkimainen L . Comparison of methods for estimating temporal topic models from primary care clinical text data:retrospective closed cohort study. JMIR Med Inform. 2022; 10 (12): e40102.

[338]

Valerio V , Rampakakis E , Zanos TP , Levy TJ , Shen HC , McDonald EG , et al. High frequency of COVID-19 vaccine hesitancy among canadians immunized for influenza: a cross-sectional survey. Vaccines. 2022; 10 (9): 1514.

[339]

Ng QX , Ng CX , Ong C , Lee DYX , Liew TM . Examining public messaging on influenza vaccine over social media:unsupervised deep learning of 235, 261 twitter posts from 2017 to 2023. Vaccines. 2023; 11 (10): 1518.

[340]

Ng QX , Lee DYX , Ng CX , Yau CE , Lim YL , Liew TM . Examining the negative sentiments related to influenza vaccination from 2017 to 2022: an unsupervised deep learning analysis of 261, 613 twitter posts. Vaccines. 2023; 11 (6): 1018.

[341]

Levi Y , Brandeau ML , Shmueli E , Yamin D . Prediction and detection of side effects severity following COVID-19 and influenza vaccinations:utilizing smartwatches and smartphones. Sci Rep. 2024; 14 (1): 6012.

[342]

Deady M , Hussein E , Cook K , Billings D , Pizarro J , Plotogea AA , et al. The food and drug administration biologics effectiveness and safety initiative facilitates detection of vaccine administrations from unstructured data in medical records through natural language processing. Front Digit Health. 2021; 3: 777905.

[343]

Zimmermann MT , Kennedy RB , Grill DE , Oberg AL , Goergen KM , Ovsyannikova IG , et al. Integration of immune cell populations, mRNA-Seq, and CpG methylation to better predict humoral immunity to influenza vaccination:dependence of mRNA-Seq/CpG methylation on immune cell populations. Front Immunol. 2017; 8: 445.

[344]

Galvan D , Effting L , Cremasco H , Adam Conte-Junior C . Can socioeconomic, health, and safety data explain the spread of Covid-19 outbreak on Brazilian Federative Units? Int J Environ Res Publ Health. 2020; 17 (23): 8921.

[345]

Wooden SL , Koff WC . The Human Vaccines Project:towards a comprehensive understanding of the human immune response to immunization. Hum Vaccines Immunother. 2018; 14 (9): 2214- 6.

[346]

Hadid A , McDonald EG , Cheng MP , Papenburg J , Libman M , Dixon PC , et al. The WE SENSE study protocol: a controlled, longitudinal clinical trial on the use of wearable sensors for early detection and tracking of viral respiratory tract infections. Contemp Clin Trials. 2023; 128: 107103.

[347]

Pletz MW , Vestergaard Jensen A , Bahrs C , Davenport C , Rupp J , Witzenrath M , et al. Unmet needs in pneumonia research: a comprehensive approach by the CAPNETZ study group. Respir Res. 2022; 23 (1): 239.

[348]

Liu J , Niu Q , Nagai-Tanima M , Aoyama T . Understanding human papillomavirus vaccination hesitancy in Japan using social media:content analysis. J Med Internet Res. 2025; 27: e68881.

[349]

Berdigaliyev N , Aljofan M . An overview of drug discovery and development. Future Med Chem. 2020; 12 (10): 939- 47.

[350]

Cummings J , Zhou Y , Lee G , Zhong K , Fonseca J , Cheng F . Alzheimer's disease drug development pipeline:2024. Alzheimer's Dement:Transl Res Clin Interv. 2024; 10 (2): e12465. eprint.

[351]

Anastasiia V , Katritch V . Computational approaches streamlining drug discovery. Nature. 2023; 616 (7958): 673- 85.

[352]

Wang J , Xiao Y , Shang X , Peng J . Predicting drug-target binding affinity with cross-scale graph contrastive learning. Briefings Bioinf. 2024; 25 (1): bbad516.

[353]

Frey NC , Soklaski R , Axelrod S , Samsi S , Gómez-Bombarelli R , Coley CW , et al. Neural scaling of deep chemical models. Nat Mach Intell. 2023; 5 (11): 1297- 305.

[354]

Huang K , Chandak P , Wang Q , Havaldar S , Vaid A , Leskovec J , et al. A foundation model for clinician-centered drug repurposing. Nat Med. 2024; 30 (12): 3601- 13.

[355]

Singhal K , Azizi S , Tu T , Sara Mahdavi S , Wei J , Chung HW , et al. Large language models encode clinical knowledge. Nature. 2023; 620 (7972): 172- 80.

[356]

Zdrazil B , Felix E , Hunter F , Manners EJ , Blackshaw J , Corbett S , et al. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024; 52 (D1): D1180- 92.

[357]

Koscielny G , An P , Carvalho-Silva D , Cham JA , Fumis L , Gasparyan R , et al. Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res. 2017; 45 (D1): D985- 94.

[358]

Wishart DS , Feunang YD , Guo AC , Lo EJ , Marcu A , Grant JR , et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018; 46 (D1): D1074- 82.

[359]

Pun FW , Ozerov IV , Zhavoronkov A . AI-powered therapeutic target discovery. Trends Pharmacol Sci. 2023; 44 (9): 561- 72.

[360]

Savage N . Drug discovery companies are customizing ChatGPT:here's how. Nat Biotechnol. 2023; 41 (5): 585- 6.

[361]

Sheikholeslami M , Mazrouei N , Gheisari Y , Fasihi A , Irajpour v , Ali M , et al. DrugGen: advancing drug discovery with large language models and reinforcement learning feedback. 2024. arXiv: 2411.14157[q-bio].

[362]

Bran AM , Cox S , Schilter O , Baldassari C , White AD , Schwaller P . Augmenting large language models with chemistry tools. Nat Mach Intell. 2024; 6 (5): 525- 35.

[363]

Boiko DA , MacKnight R , Kline B , Gomes G . Autonomous chemical research with large language models. Nature. 2023; 624 (7992): 570- 8.

[364]

ValizadehAslani T , Shi Y , Ren P , Wang J , Zhang Y , Hu M , et al. PharmBERT: a domain-specific BERT model for drug labels. Briefings Bioinf. 2023; 24 (4): bbad226.

[365]

Chaves JMZ , Wang E , Tu T , Dhaval Vaishnav E , Lee B , Sara Mahdavi S , et al. Tx-LLM: a large language model for therapeutics. 2024. arXiv: 2406.06316[cs].

[366]

Singh R , Sledzieski S , Bryson B , Cowen L , Berger B . Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci USA. 2023; 120 (24): e2220778120.

[367]

Yang Z , Liu J , Yang F , Zhang X , Zhang Q , Zhu X , et al. Advancing Drug-Target Interaction prediction with BERT and subsequence embedding. Comput Biol Chem. 2024; 110: 108058.

[368]

Kalakoti Y , Yadav S , Sundar D . TransDTI:transformer-Based Language Models for estimating DTIs and building a drug recommendation workflow. ACS Omega. 2022; 7 (3): 2706- 17.

[369]

Luo Z , Wu W , Sun Q , Wang J . Accurate and transferable drug-target interaction prediction with drugLAMP. Bioinformatics. 2024; 40 (12): btae693.

[370]

Bal R , Xiao Y , Wang W . PGraphDTA: improving drug target interaction prediction using protein language models and contact maps. 2024. arXiv: 2310.04017.

[371]

Fan Q , Liu Y , Zhang S , Ning X , Xu C , Han W , et al. CGPDTA: an explainable transfer learning-based predictor with molecule substructure graph for drug-target binding affinity. J Comput Chem. 2025; 46 (1): e27538.

[372]

Liang Y , Zhang R , Li Y , Huo M , Ma Z , Singh D , et al. Multi-modal large language model enables all-purpose prediction of drug mechanisms and properties. 2024. 2024.09.29.615524. Section:New Results.

[373]

Ma T , Lin X , Li T , Li C , Chen L , Zhou P , et al. Y-Mol: a multiscale biomedical knowledge-guided large language model for drug development. 2024. arXiv: 2410.11550[cs].

[374]

Inoue Y , Song T , Fu T . DrugAgent:explainable drug repurposing agent with large language model-based reasoning. 2024. arXiv: 2408.13378[cs].

[375]

Davis AP , Grondin CJ , Johnson RJ , Sciaky D , Wiegers J , Wiegers TC , et al. Comparative toxicogenomics database(CTD):update 2021. Nucleic Acids Res. 2021; 49 (D1): D1138- 43.

[376]

Chen L , Fan Z , Chang J , Yang R , Hou H , Guo H , et al. Sequence-based drug design as a concept in computational drug design. Nat Commun. 2023; 14 (1): 4217.

[377]

Zhang S , Xie L . Protein language model-powered 3d ligand binding site prediction from protein sequence. In:NeurIPS 2023 AI for science workshop; 2023.

[378]

Fang X , Wang F , Liu L , He J , Lin D , Xiang Y , et al. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. Nat Mach Intell. 2023; 5 (10): 1087- 96.

[379]

Chakraborty C , Bhattacharya M , Lee S-S . Artificial intelligence enabled chatGPT and large language models in drug target discovery, drug discovery, and development. Mol Ther Nucleic Acids. 2023; 33: 866- 8.

[380]

Shaker B , Ahmad S , Lee J , Jung C , Na D . In silico methods and tools for drug discovery. Comput Biol Med. 2021; 137: 104851.

[381]

Sharma G , Thakur A . ChatGPT in drug discovery. 2023.

[382]

Morris GM , Huey R , Olson AJ . Using autodock for ligand-receptor docking. Curr Protoc Bioinform. 2008; 24 (1): 8- 14.

[383]

Liang Y , Zhang R , Zhang L , Xie P . DrugChat:towards enabling chatGPT-like capabilities on drug molecule graphs. 2023. Preprint at arXiv: 2309.03907.

[384]

Shen C , Zhang X , Deng Y , Gao J , Wang D , Xu L , et al. Boosting protein-ligand binding pose prediction and virtual screening based on residue-atom distance likelihood potential and graph transformer. J Med Chem. 2022; 65 (15): 10691- 706.

[385]

Trott O , Olson AJ . Autodock vina:improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010; 31 (2): 455- 61.

[386]

Bao J , He X , Zhang JZH . DeepBSP-a machine learning method for accurate prediction of protein-ligand docking structures. J Chem Inf Model. 2021; 61 (5): 2231- 40.

[387]

Méndez-Lucio O , Ahmad M , del Rio-Chanona EA , Wegner JK . A geometric deep learning approach to predict binding conformations of bioactive molecules. Nat Mach Intell. 2021; 3 (12): 1033- 9.

[388]

Niu Z , Xiao X , Wu W , Cai Q , Jiang Y , Jin W , et al. PharmaBench:enhancing admet benchmarks with large language models. Sci Data. 2024; 11 (1): 985.

[389]

Gawande MS , Zade N , Kumar P , Gundewar S , Nayodhara Weerarathna I , Verma P . The role of artificial intelligence in pandemic responses:from epidemiological modeling to vaccine development. Mol biomed. 2025; 6 (1): 1.

[390]

Anderson LN , Hoyt CT , Zucker JD , McNaughton AD , Teuton JR , Karis K , et al. Computational tools and data integration to accelerate vaccine development:challenges, opportunities, and future directions. Front Immunol. 2025; 16: 1502484.

[391]

Tomic A , Tomic I , Rosenberg-Hasson Y , Dekker CL , Maecker HT , Davis MM . SIMON, an automated machine learning system, reveals immune signatures of influenza vaccine responses. J Immunol. 2019; 203 (3): 749- 59.

[392]

Hayawi K , Shahriar S , Alashwal H , Serhani MA . Generative AI and large language models: a new frontier in reverse vaccinology. Inform Med Unlocked. 2024; 48: 101533.

[393]

Luciani LL , Miller LM , Zhai B , Clarke K , Kramer KH , Schratz LJ , et al. Blood inflammatory biomarkers differentiate inpatient and outpatient coronavirus disease 2019 from influenza. Open Forum Infect Dis. 2023; 10 (3): ofad095.

[394]

McGarvey PB , Suzek BE , Baraniuk JN , Rao S , Conkright B , Lababidi S , et al. In silico analysis of autoimmune diseases and genetic relationships to vaccination against infectious diseases. BMC Immunol. 2014; 15 (1): 61.

[395]

Kim M , Kim YJ , Park SJ , Kim KG , Oh PC , Kim YS , et al. Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease. BMC Cardiovasc Disord. 2021; 21 (1): 129.

[396]

Cotugno N , Santilli V , Pascucci GR , Manno EC , De Armas L , Pallikkuth S , et al. Artificial intelligence applied to in vitro gene expression testing (IVIGET) to predict trivalent inactivated influenza vaccine immunogenicity in HIV infected children. Front Immunol. 2020; 11: 559590.

[397]

Furman D , Jojic V , Kidd B , Shen-Orr S , Price J , Jarrell J , et al. Apoptosis and other immune biomarkers predict influenza vaccine responsiveness. Mol Syst Biol. 2013; 9 (1): 659.

[398]

Ruga T , Vocaturo E , Zumpano E . On the role of LLM to forecast the next pandemic. In:2024 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE; 2024. p. 6567- 73.

[399]

Raj Saxena R . Examining reactions about Covid-19 vaccines: a systematic review of studies utilizing deep learning for sentiment analysis. 2024. Authorea Preprints.

[400]

Hou AB , Du H , Wang Y , Zhang J , Wang Z , Liang PP , et al. Can a society of generative agents simulate human behavior and inform public health policy? A case study on vaccine hesitancy. 2025. Preprint at arXiv: 2503.09639.

[401]

Sutskever I . Sequence to sequence learning with neural networks. 2014. Preprint at arXiv:1409.3215.

RIGHTS & PERMISSIONS

The Author(s). Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.

AI Summary AI Mindmap
PDF (1128KB)

90

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/