
Foundation models in molecular biology
Yunda Si, Jiawei Zou, Yicheng Gao, Guohui Chuai, Qi Liu, Luonan Chen
Biophysics Reports ›› 2024, Vol. 10 ›› Issue (3) : 135-151.
Foundation models in molecular biology
Determining correlations between molecules at various levels is an important topic in molecular biology. Large language models have demonstrated a remarkable ability to capture correlations from large amounts of data in the field of natural language processing as well as image generation, and correlations captured from data using large language models can also be applicable to solving a wide range of specific tasks, hence large language models are also referred to as foundation models. The massive amount of data that exists in the field of molecular biology provides an excellent basis for the development of foundation models, and the recent emergence of foundation models in the field of molecular biology has really pushed the entire field forward. We summarize the foundation models developed based on RNA sequence data, DNA sequence data, protein sequence data, single-cell transcriptome data, and spatial transcriptome data respectively, and further discuss the research directions for the development of foundation models in molecular biology.
Foundation models / Molecular biology / Transcriptome
[1] |
Abdelaal T , Mourragui S , Mahfouz A , Reinders MJT . SpaGE: spatial gene enhancement using scRNA-Seq. Nucleic Acids Res, 2020, 48(18): e107
CrossRef
Google scholar
|
[2] |
Baek M , DiMaio F , Anishchenko I , Dauparas J , Ovchinnikov S , Lee GR , Wang J , Cong Q , Kinch LN , Schaeffer RD , Millán C , Park H , Adams C , Glassman CR , DeGiovanni A , Pereira JH , Rodrigues AV , van Dijk AA , Ebrecht AC , Opperman DJ , Sagmeister T , Buhlheller C , Pavkov-Keller T , Rathinaswamy MK , Dalwadi U , Yip CK , Burke JE , Garcia KC , Grishin NV , Adams PD , Read RJ , Baker D . Accurate prediction of protein structures and interactions using a three-track neural network. Science, 2021, 373(6557): 871–876
CrossRef
Google scholar
|
[3] |
Baek M , McHugh R , Anishchenko I , Jiang H , Baker D , DiMaio F . Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat Methods, 2024, 21(1): 117–121
CrossRef
Google scholar
|
[4] |
Bafna M , Li H , Zhang X . CLARIFY: cell–cell interaction and gene regulatory network refinement from spatially resolved transcriptomics. Bioinformatics, 2023, 39(Suppl 1): i484–i493
|
[5] |
Bai XC , McMullan G , Scheres SH . How Cryo-EM is revolutionizing structural biology. Trends Biochem Sci, 2015, 40(1): 49–57
CrossRef
Google scholar
|
[6] |
Benegas G , Batra SS , Song YS . DNA language models are powerful predictors of genome-wide variant effects. Proc Natl Acad Sci USA, 2023, 120(44): e2311219120
CrossRef
Google scholar
|
[7] |
Ben-Tal N , Kolodny R . Homologues not needed: structure prediction from a protein language model. Structure, 2022, 30(8): 1047–1049
CrossRef
Google scholar
|
[8] |
Bepler T , Berger B . Learning the protein language: evolution, structure, and function. Cell Systems, 2021, 12(6): 654–669
CrossRef
Google scholar
|
[9] |
Biancalani T , Scalia G , Buffoni L , Avasthi R , Lu Z , Sanger A , Tokcan N , Vanderburg CR , Segerstolpe Å , Zhang M , Avraham-Davidi I , Vickovic S , Nitzan M , Ma S , Subramanian A , Lipinski M , Buenrostro J , Brown NB , Fanelli D , Zhuang X , Macosko EZ , Regev A . Deep learning and alignment of spatially resolved single-cell transcriptomes with tangram. Nat Methods, 2021, 18(11): 1352–1362
CrossRef
Google scholar
|
[10] |
Brown TBMann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Advances in Neural Information Processing Systems. pp. 1877–1901
|
[11] |
Brunger AT . Version 1.2 of the crystallography and NMR system. Nat Protocols, 2007, 2(11): 2728–2733
|
[12] |
Cao Y , Zhu J , Jia P , Zhao Z . scRNASeqDB: a database for RNA-Seq based gene expression profiles in human single cells. Genes (Basel), 2017, 8(12): 368.
CrossRef
Google scholar
|
[13] |
Chaudhury S , Lyskov S , Gray JJ . PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics, 2010, 26(5): 689–691
CrossRef
Google scholar
|
[14] |
Chen A , Liao S , Cheng M , Ma K , Wu L , Lai Y , Qiu X , Yang J , Xu J , Hao S , Wang X , Lu H , Chen X , Liu X , Huang X , Li Z , Hong Y , Jiang Y , Peng J , Liu S , Shen M , Liu C , Li Q , Yuan Y , Wei X , Zheng H , Feng W , Wang Z , Liu Y , Wang Z , Yang Y , Xiang H , Han L , Qin B , Guo P , Lai G , Muñoz-Cánoves P , Maxwell PH , Thiery JP , Wu QF , Zhao F , Chen B , Li M , Dai X , Wang S , Kuang H , Hui J , Wang L , Fei JF , Wang O , Wei X , Lu H , Wang B , Liu S , Gu Y , Ni M , Zhang W , Mu F , Yin Y , Yang H , Lisby M , Cornall RJ , Mulder J , Uhlén M , Esteban MA , Li Y , Liu L , Xu X , Wang J . Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell, 2022, 185(10): 1777–1792
CrossRef
Google scholar
|
[15] |
Chen J, Hu Z, Sun S, Tan Q, Wang Y, Yu Q, Zong L, Hong L, Xiao J, Shen T, King I, Li Y (2022) Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. arXiv.
|
[16] |
Chen M , Ma Y , Wu S , Zheng X , Kang H , Sang J , Xu X , Hao L , Li Z , Gong Z , Xiao J , Zhang Z , Zhao W , Bao Y . Genome warehouse: a public repository housing genome-scale data. Genomics, Proteomics Bioinformatics, 2021, 19(4): 584–589
CrossRef
Google scholar
|
[17] |
Chen S , Zhang B , Chen X , Zhang X , Jiang R . stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics, 2021, 37(Suppl_1): i299–i307
|
[18] |
Chen WT , Lu A , Craessaerts K , Pavie B , Sala Frigerio C , Corthout N , Qian X , Laláková J , Kühnemund M , Voytyuk I , Wolfs L , Mancuso R , Salta E , Balusu S , Snellinx A , Munck S , Jurek A , Fernandez Navarro J , Saido TC , Huitinga I , Lundeberg J , Fiers M , De Strooper B . Spatial transcriptomics and in situ sequencing to study Alzheimer’s disease. Cell, 2020, 182(4): 976–991
CrossRef
Google scholar
|
[19] |
Chowdhury R , Bouatta N , Biswas S , Floristean C , Kharkar A , Roy K , Rochereau C , Ahdritz G , Zhang J , Church GM , Sorger PK , AlQuraishi M . Single-sequence protein structure prediction using a language model and deep learning. Nat Biotechnol, 2022, 40(11): 1617–1623
CrossRef
Google scholar
|
[20] |
Chuai G , Ma H , Yan J , Chen M , Hong N , Xue D , Zhou C , Zhu C , Chen K , Duan B , Gu F , Qu S , Huang D , Wei J , Liu Q . DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol, 2018, 19(1): 80
CrossRef
Google scholar
|
[21] |
Cirillo D , Federico A , Tartaglia GG . Predictions of protein–RNA interactions. WIREs Comput Mol Sci, 2012, 3(2): 161–175
|
[22] |
Cui H, Wang C, Maan H, Duan N, Wang B (2022) scFormer: a universal representation learning approach for single-cell data using transformers. bioRxiv. https://doi.org/10.1101/2022.11.20.517285
|
[23] |
Cui H, Wang C, Maan H, Pang K, Luo F, Duan N, Wang B (2023) scGPT: towards building a foundation model for single-cell multi-omics using generative AI. Nat Methods. https:// doi.org/10.1038/s41592-024-02201-0
|
[24] |
Cui Y, Che W, Liu T, Qin B, Wang S, Hu G (2020) Revisiting pre-trained models for Chinese natural language processing. In: Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 657–668
|
[25] |
Dai H , Li L , Zeng T , Chen L . Cell-specific network constructed by single-cell RNA sequencing data. Nucleic Acids Res, 2019, 47(11): e62
CrossRef
Google scholar
|
[26] |
Dalla-Torre H, Gonzalez L, Revilla JM, Carranza NL, Grzywaczewski AH, Oteri F, Dallago C, Trop E, Sirelkhatim H, Richard G, Skwark M, Beguir K, Lopez M, Pierrot T (2023) The nucleotide transformer: building and evaluating robust foundation models for human genomics. bioRxiv. https://doi.org/10.1101/2023.01.11.523679
|
[27] |
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171–4186
|
[28] |
Ding W , Mao W , Shao D , Zhang W , Gong H . DeepConPred2: An improved method for the prediction of protein residue contacts. Comput Struct Biotechnol J, 2018, 16: 503–510
CrossRef
Google scholar
|
[29] |
Dobson CM . Protein misfolding, evolution and disease. Trends Biochem Sci, 1999, 24(9): 329–332
CrossRef
Google scholar
|
[30] |
Dodge J, Ilharco G, Schwartz R, Farhadi A, Hajishirzi H, Smith N (2020) Fine-tuning pretrained language models: weight initializations, data orders, and early stopping. arXiv. https://doi.org/10.48550/arXiv.2002.06305
|
[31] |
Dong K , Zhang S . Deciphering Spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun, 2022, 13(1): 1739
CrossRef
Google scholar
|
[32] |
Dong L, Yang N, Wang W, Wei F, Liu X, Wang Y, Gao J, Zhou M, Hon H-W (2019) Unified language model pre-training for natural language understanding and generation. arXiv.
|
[33] |
Elnaggar A , Heinzinger M , Dallago C , Rehawi G , Wang Y , Jones L , Gibbs T , Fehér TB , Angerer C , Steinegger M , Bhowmik D , Rost B . ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing. IEEE Trans Pattern Anal Mach Intell, 2022, 44(10): 7112–7127
CrossRef
Google scholar
|
[34] |
Elosua-Bayes M , Nieto P , Mereu E , Gut I , Heyn H . SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res, 2021, 49(9): e50
CrossRef
Google scholar
|
[35] |
Ethayarajh K (2019) How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 55–65
|
[36] |
Ferri-Borgogno S , Zhu Y , Sheng J , Burks JK , Gomez JA , Wong KK , Wong STC , Mok SC . Spatial transcriptomics depict ligand-receptor cross-talk heterogeneity at the tumor-stroma interface in long-term ovarian cancer survivors. Cancer Res, 2023, 83(9): 1503–1516
CrossRef
Google scholar
|
[37] |
Ferruz N , Schmidt S , Höcker B . ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun, 2022, 13(1): 4348
CrossRef
Google scholar
|
[38] |
Fu H, Xu H, Chong K, Li M, Ang KS, Lee HK, Ling J, Chen A, Shao L, Liu L, Chen J (2021) Unsupervised spatially embedded deep representation of spatial transcriptomics. bioRxiv. https://doi.org/10.1101/2021.06.15.448542
|
[39] |
Gao Z , Jiang C , Zhang J , Jiang X , Li L , Zhao P , Yang H , Huang Y , Li J . Hierarchical graph learning for protein–protein interaction. Nat Commun, 2023, 14(1): 1093
CrossRef
Google scholar
|
[40] |
Golkov, Vladimir, Marcin J. Skwark, Antonij Golkov, Alexey Dosovitskiy, Thomas Brox, Jens Meiler, and Daniel Cremers (2016) Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. pp. 4222–4230
|
[41] |
Goodsell DS , Zardecki C , Di Costanzo L , Duarte JM , Hudson BP , Persikova I , Segura J , Shao C , Voigt M , Westbrook JD , Young JY , Burley SK . RCSB Protein Data Bank: enabling biomedical research and drug discovery. Protein Sci, 2020, 29(1): 52–65
CrossRef
Google scholar
|
[42] |
Hao M, Gong J, Zeng X, Liu C, Guo Y, Cheng X, Wang T, Ma J, Song L, Zhang X (2023) Large scale foundation model on single-cell transcriptomics. bioRxiv. https://doi.org/10.1101/2023.05.29.542705
|
[43] |
Hartl FU . Protein misfolding diseases. Annu Rev Biochem, 2017, 86(1): 21–26
CrossRef
Google scholar
|
[44] |
He B , Mortuza SM , Wang Y , Shen HB , Zhang Y . NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics, 2017, 33(15): 2296–2306
CrossRef
Google scholar
|
[45] |
He K , Gkioxari G , Dollar P , Girshick R . Mask R-CNN. IEEE Trans Pattern Anal Mach Intell, 2020, 42(2): 386–397
CrossRef
Google scholar
|
[46] |
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778
|
[47] |
Heinzinger M , Elnaggar A , Wang Y , Dallago C , Nechaev D , Matthes F , Rost B . Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics, 2019, 20(1): 723
CrossRef
Google scholar
|
[48] |
Henderson B , Pockley AG . Molecular chaperones and protein-folding catalysts as intercellular signaling regulators in immunity and inflammation. J Leukoc Biol, 2010, 88(3): 445–462
CrossRef
Google scholar
|
[49] |
Hesslow D, Zanichelli N, Notin P, Poli I, Marks D (2022) RITA: a study on scaling up generative protein sequence models. arXiv. https://doi.org/10.48550/arXiv.2205.05789
|
[50] |
Hong Y , Lee J , Ko J . A-Prot: protein structure modeling using MSA transformer. BMC Bioinformatics, 2022, 23(1): 93
|
[51] |
Hu J , Li X , Coleman K , Schroeder A , Ma N , Irwin DJ , Lee EB , Shinohara RT , Li M . SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods, 2021, 18(11): 1342–1351
CrossRef
Google scholar
|
[52] |
Iacono G , Massoni-Badosa R , Heyn H . Single-cell transcriptomics unveils gene regulatory network plasticity. Genome Biol, 2019, 20(1): 110
CrossRef
Google scholar
|
[53] |
Jankowsky E , Harris ME . Specificity and nonspecificity in RNA–protein interactions. Nat Rev Mol Cell Biol, 2015, 16(9): 533–544
CrossRef
Google scholar
|
[54] |
Ji Y , Zhou Z , Liu H , Davuluri RV . DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics, 2021, 37(15): 2112–2120
CrossRef
Google scholar
|
[55] |
Jones DT , Singh T , Kosciolek T , Tetchner S . MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics (Oxford, England), 2015, 31(7): 999–1006
|
[56] |
Joshi V, Peters M, Hopkins M (2018) Extending a parser to distant domains using a few dozen partially annotated examples. arXiv. https://doi.org/10.48550/arXiv.1805.06556
|
[57] |
Jovic D , Liang X , Zeng H , Lin L , Xu F , Luo Y . Single-cell RNA Sequencing technologies and applications: a brief overview. Clin Transl Med, 2022, 12(3): e694
CrossRef
Google scholar
|
[58] |
Ju F , Zhu J , Shao B , Kong L , Liu TY , Zheng WM , Bu D . CopulaNet: learning residue co-evolution directly from multiple sequence alignment for protein structure prediction. Nat Commun, 2021, 12(1): 2535
CrossRef
Google scholar
|
[59] |
Jumper J , Evans R , Pritzel A , Green T , Figurnov M , Ronneberger O , Tunyasuvunakool K , Bates R , Žídek A , Potapenko A , Bridgland A , Meyer C , Kohl SAA , Ballard AJ , Cowie A , Romera-Paredes B , Nikolov S , Jain R , Adler J , Back T , Petersen S , Reiman D , Clancy E , Zielinski M , Steinegger M , Pacholska M , Berghammer T , Bodenstein S , Silver D , Vinyals O , Senior AW , Kavukcuoglu K , Kohli P , Hassabis D . Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596(7873): 583–589
CrossRef
Google scholar
|
[60] |
Kim DE , Dimaio F , Yu-Ruei Wang R , Song Y , Baker D . One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins, 2014, 82(S2): 208–218
CrossRef
Google scholar
|
[61] |
Klein T, Nabi M (2019) Learning to answer by learning to ask: getting the best of GPT-2 and BERT worlds. arXiv. https://doi.org/10.48550/arXiv.1911.02365
|
[62] |
Kleshchevnikov V , Shmatko A , Dann E , Aivazidis A , King HW , Li T , Elmentaite R , Lomakin A , Kedlian V , Gayoso A , Jain MS , Park JS , Ramona L , Tuck E , Arutyunyan A , Vento-Tormo R , Gerstung M , James L , Stegle O , Bayraktar OA . Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol, 2022, 40(5): 661–671
CrossRef
Google scholar
|
[63] |
Kolodziejczyk AA , Kim JK , Svensson V , Marioni JC , Teichmann SA . The technology and biology of single-cell RNA sequencing. Mol Cell, 2015, 58(4): 610–620
CrossRef
Google scholar
|
[64] |
Kulmanov M , Hoehndorf R . DeepGOPlus: improved protein function prediction from sequence. Bioinformatics, 2020, 36(2): 422–429
CrossRef
Google scholar
|
[65] |
Lecun Y , Bottou L , Bengio Y , Haffner P . Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86(11): 2278–2324
CrossRef
Google scholar
|
[66] |
Lenz S , Sinn LR , O'Reilly FJ , Fischer L , Wegner F , Rappsilber J . Reliable identification of protein-protein interactions by crosslinking mass spectrometry. Nat Communs, 2021, 12(1): 3564
CrossRef
Google scholar
|
[67] |
Li J , Chen S , Pan X , Yuan Y , Shen HB . Cell clustering for spatial transcriptomics data with graph neural networks. Nat Comput Sci, 2022a, 2(6): 399–408
CrossRef
Google scholar
|
[68] |
Li JH , Liu S , Zhou H , Qu LH , Yang JH . starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-seq data. Nucleic Acids Res, 2014, 42(D1): D92–97
|
[69] |
Li X , Han P , Chen W , Gao C , Wang S , Song T , Niu M , Rodriguez-Patón A . MARPPI: boosting prediction of protein–protein interactions with multi-scale architecture residual network. Briefings Bioinform, 2022b, 24(1): bbac524
CrossRef
Google scholar
|
[70] |
Li Y , Zhang C , Feng C , Pearce R , Lydia Freddolino P , Zhang Y . Integrating end-to-end learning with deep geometrical potentials for ab initio RNA structure prediction. Nat Commun, 2023, 14(1): 5745
CrossRef
Google scholar
|
[71] |
Limo MJ , Sola-Rabada A , Boix E , Thota V , Westcott ZC , Puddu V , Perry CC . Interactions between metal oxides and biomolecules: from fundamental understanding to applications. Chem Rev, 2018, 118(22): 11118–11193
CrossRef
Google scholar
|
[72] |
Lin Z , Akin H , Rao R , Hie B , Zhu Z , Lu W , Smetanin N , Verkuil R , Kabeli O , Shmueli Y , Dos Santos Costa A , Fazel-Zarandi M , Sercu T , Candido S , Rives A . Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 2023, 379(6637): 1123–1130
CrossRef
Google scholar
|
[73] |
Liu L, Li W, Wong K-C, Yang F, Yao J (2023) A pre-trained large generative model for translating single-cell transcriptome to proteome. bioRxiv. https://doi.org/10.1101/2023.07.04.547619
|
[74] |
Liu T , Fang ZY , Zhang Z , Yu Y , Li M , Yin MZ . A comprehensive overview of graph neural network-based approaches to clustering for spatial transcriptomics. Comput Struct Biotechnol J, 2024, 23: 106–128
CrossRef
Google scholar
|
[75] |
Long Y , Ang KS , Li M , Chong KLK , Sethi R , Zhong C , Xu H , Ong Z , Sachaphibulkij K , Chen A , Zeng L , Fu H , Wu M , Lim LHK , Liu L , Chen J . Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun, 2023, 14(1): 1155
CrossRef
Google scholar
|
[76] |
Lu H , Zhou Q , He J , Jiang Z , Peng C , Tong R , Shi J . Recent advances in the development of protein–protein interactions modulators: mechanisms and clinical trials. Signal Transduct Target Ther, 2020, 5(1): 213
CrossRef
Google scholar
|
[77] |
Madani A , Krause B , Greene ER , Subramanian S , Mohr BP , Holton JM , Olmos JL Jr , Xiong C , Sun ZZ , Socher R , Fraser JS , Naik N . Large language models generate functional protein sequences across diverse families. Nat Biotechnol, 2023, 41(8): 1099–1106
CrossRef
Google scholar
|
[78] |
Mann M , Wright PR , Backofen R . IntaRNA 2.0: enhanced and customizable prediction of RNA–RNA interactions. Nucleic Acids Res, 2017, 45(W1): W435–W439
|
[79] |
McDowall MD , Scott MS , Barton GJ . PIPs: human protein–protein interaction prediction database. Nucleic Acids Res, 2009, 37(suppl_1): D651–D656
|
[80] |
Mirdita M , von den Driesch L , Galiez C , Martin MJ , Söding J , Steinegger M . Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res, 2017, 45(D1): D170–D176
CrossRef
Google scholar
|
[81] |
Mistry J , Chuguransky S , Williams L , Qureshi M , Salazar GA , Sonnhammer ELL , Tosatto SCE , Paladin L , Raj S , Richardson LJ , Finn RD , Bateman A . Pfam: the protein families database in 2021. Nucleic Acids Res, 2021, 49(D1): D412–D419
CrossRef
Google scholar
|
[82] |
Moreno P , Fexova S , George N , Manning JR , Miao Z , Mohammed S , Muñoz-Pomer A , Fullgrabe A , Bi Y , Bush N , Iqbal H , Kumbham U , Solovyev A , Zhao L , Prakash A , García-Seisdedos D , Kundu DJ , Wang S , Walzer M , Clarke L , Osumi-Sutherland D , Tello-Ruiz MK , Kumari S , Ware D , Eliasova J , Arends MJ , Nawijn MC , Meyer K , Burdett T , Marioni J , Teichmann S , Vizcaíno JA , Brazma A , Papatheodorou I . Expression atlas update: gene and protein expression in multiple species. Nucleic Acids Res, 2022, 50(D1): D129–D140
CrossRef
Google scholar
|
[83] |
NCBI Resource Coordinators . Database resources of the national center for biotechnology information. Nucleic Acids Rese, 2014, 42(D1): D7–D17
CrossRef
Google scholar
|
[84] |
Nguyen TC , Cao X , Yu P , Xiao S , Lu J , Biase FH , Sridhar B , Huang N , Zhang K , Zhong S . Mapping RNA–RNA interactome and RNA structure in vivo by MARIO. Nat Commun, 2016, 7(1): 12023
CrossRef
Google scholar
|
[85] |
Nooren IMA , Thornton JM . Diversity of protein–protein interactions. EMBO J, 2003, 22(14): 3486–3492
CrossRef
Google scholar
|
[86] |
Oughtred R , Rust J , Chang C , Breitkreutz BJ , Stark C , Willems A , Boucher L , Leung G , Kolas N , Zhang F , Dolma S , Coulombe-Huntington J , Chatr-Aryamontri A , Dolinski K , Tyers M . The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci, 2021, 30(1): 187–200
CrossRef
Google scholar
|
[87] |
Pang Y , Liu B . IDP-LM: prediction of protein intrinsic disorder and disorder functions based on language models. PLoS Computat Biol, 2023, 19(11): e1011657
CrossRef
Google scholar
|
[88] |
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers). pp. 2227–2237
|
[89] |
Pokharel S , Pratyush P , Heinzinger M , Newman RH , Kc DB . Improving protein succinylation sites prediction using embeddings from protein language model. Sci Rep, 2022, 12: 16933
CrossRef
Google scholar
|
[90] |
Puton T , Kozlowski L , Tuszynska I , Rother K , Bujnicki JM . Computational methods for prediction of protein–RNA interactions. J Struct Biol, 2012, 179(3): 261–268
CrossRef
Google scholar
|
[91] |
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training.
|
[92] |
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners.
|
[93] |
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2019) Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv.
|
[94] |
Ramanathan M , Porter DF , Khavari PA . Methods to study RNA–protein interactions. Nat Methods, 2019, 16(3): 225–234
CrossRef
Google scholar
|
[95] |
Rao R , Bhattacharya N , Thomas N , Duan Y , Chen X , Canny J , Abbeel P , Song YS . Evaluating protein transfer learning with TAPE. Adv Neural Inf Process Syst, 2019, 32: 9689–9701
|
[96] |
Rao RM, Liu J, Verkuil R, Meier J, Canny J, Abbeel P, Sercu T, Rives A (2021) MSA Transformer. In: Proceedings of the 38th International Conference on Machine Learning. pp. 8844–8856
|
[97] |
Rao VS , Srinivas K , Sujini GN , Kumar GN . Protein-protein interaction detection: methods and analysis. Int J Proteomics, 2014, 2014: 147648
CrossRef
Google scholar
|
[98] |
Rives A , Meier J , Sercu T , Goyal S , Lin Z , Liu J , Guo D , Ott M , Zitnick CL , Ma J , Fergus R . Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci USA, 2021, 118(15): e2016239118
CrossRef
Google scholar
|
[99] |
Rodriques SG , Stickels RR , Goeva A , Martin CA , Murray E , Vanderburg CR , Welch J , Chen LM , Chen F , Macosko EZ . Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science, 2019, 363(6434): 1463–1467
CrossRef
Google scholar
|
[100] |
Rual JF , Venkatesan K , Hao T , Hirozane-Kishikawa T , Dricot A , Li N , Berriz GF , Gibbons FD , Dreze M , Ayivi-Guedehoussou N , Klitgord N , Simon C , Boxem M , Milstein S , Rosenberg J , Goldberg DS , Zhang LV , Wong SL , Franklin G , Li S , Albala JS , Lim J , Fraughton C , Llamosas E , Cevik S , Bex C , Lamesch P , Sikorski RS , Vandenhaute J , Zoghbi HY , Smolyar A , Bosak S , Sequerra R , Doucette-Stamm L , Cusick ME , Hill DE , Roth FP , Vidal M . Towards a proteome-scale map of the human protein–protein interaction network. Nature, 2005, 437(7062): 1173–1178
CrossRef
Google scholar
|
[101] |
Senior AW , Evans R , Jumper J , Kirkpatrick J , Sifre L , Green T , Qin C , Žídek A , Nelson AWR , Bridgland A , Penedones H , Petersen S , Simonyan K , Crossan S , Kohli P , Jones DT , Silver D , Kavukcuoglu K , Hassabis D . Improved protein structure prediction using potentials from deep learning. Nature, 2020, 577(7792): 706–710
CrossRef
Google scholar
|
[102] |
Shah S , Takei Y , Zhou W , Lubeck E , Yun J , Eng CL , Koulena N , Cronin C , Karp C , Liaw EJ , Amin M , Cai L . Dynamics and spatial genomics of the nascent transcriptome by intron seqFISH. Cell, 2018, 174(2): 363–376
CrossRef
Google scholar
|
[103] |
Singh R , Devkota K , Sledzieski S , Berger B , Cowen L . Topsy-Turvy: integrating a global view into sequence-based PPI prediction. Bioinformatics, 2022, 38(Suppl_1): i264–i272
|
[104] |
Sledzieski S , Singh R , Cowen L , Berger B . D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Systems, 2021, 12(10): 969–682
CrossRef
Google scholar
|
[105] |
Song Q , Su J . DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence. BriefBioinform, 2021, 22(5): bbaa414
CrossRef
Google scholar
|
[106] |
Stickels RR , Murray E , Kumar P , Li J , Marshall JL , Di Bella DJ , Arlotta P , Macosko EZ , Chen F . Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat Biotechnol, 2021, 39(3): 313–319
CrossRef
Google scholar
|
[107] |
Tang Z , Li Z , Hou T , Zhang T , Yang B , Su J , Song Q . SiGra: single-cell spatial elucidation through an image-augmented graph transformer. Nat Commun, 2023, 14(1): 5618
CrossRef
Google scholar
|
[108] |
The RNAcentral Consortium . RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res, 2019, 47(D1): D221–D229
CrossRef
Google scholar
|
[109] |
Theodoris CV , Xiao L , Chopra A , Chaffin MD , Al Sayed ZR , Hill MC , Mantineo H , Brydon EM , Zeng Z , Liu XS , Ellinor PT . Transfer learning enables predictions in network biology. Nature, 2023, 618(7965): 616–624
CrossRef
Google scholar
|
[110] |
Tiwari P , Chakrabarty D . Dehydrin in the past four decades: from chaperones to transcription co-regulators in regulating abiotic stress response. Curr Res Biotechnol, 2021, 3: 249–259
CrossRef
Google scholar
|
[111] |
Umu SU , Gardner PP . A comprehensive benchmark of RNA–RNA interaction prediction tools for all domains of life. Bioinformatics, 2017, 33(7): 988–996
CrossRef
Google scholar
|
[112] |
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 6000–6010
|
[113] |
Verkuil R Kabeli O, Du Y, Wicky BIM, Milles LF, Dauparas J, Baker D, Ovchinnikov S, Sercu T, Rives A (2022) Language models generalize beyond natural proteins. bioRxiv. https://doi.org/10.1101/2022.12.21.521521
|
[114] |
Vickovic S , Eraslan G , Salmén F , Klughammer J , Stenbeck L , Schapiro D , Äijö T , Bonneau R , Bergenstråhle L , Navarro JF , Gould J , Griffin GK , Borg Å , Ronaghi M , Frisén J , Lundeberg J , Regev A , Ståhl PL . High-definition spatial transcriptomics for in situ tissue profiling. Nat Methods, 2019, 16(10): 987–990
CrossRef
Google scholar
|
[115] |
Wang B , Luo J , Liu Y , Shi W , Xiong Z , Shen C , Long Y . Spatial-MGCN: a novel multi-view graph convolutional network for identifying spatial domains with attention mechanism. Brief Bioinforms, 2023a, 24(5): bbad262
CrossRef
Google scholar
|
[116] |
Wang G , Zhao J , Yan Y , Wang Y , Wu AR , Yang C . Construction of a 3D whole organism spatial atlas by joint modelling of multiple slices with deep neural networks. Nat Mach Intell, 2023b, 5(11): 1200–1213
CrossRef
Google scholar
|
[117] |
Wang J , Chen Y , Zou Q . Inferring gene regulatory network from single-cell transcriptomes with graph autoencoder model. PLoS Genet, 2023c, 19(9): e1010942
CrossRef
Google scholar
|
[118] |
Wang KC , Chang HY . Molecular mechanisms of long noncoding RNAs. Mol Cell, 2011, 43(6): 904–914
CrossRef
Google scholar
|
[119] |
Wang S , Sun S , Li Z , Zhang R , Xu J . Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol, 2017, 13(1): 1005324
CrossRef
Google scholar
|
[120] |
Wang W , Feng C , Han R , Wang Z , Ye L , Du Z , Wei H , Zhang F , Peng Z , Yang J . trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat Commun, 2023d, 14(1): 7266
CrossRef
Google scholar
|
[121] |
Wang W , Peng Z , Yang J . Single-sequence protein structure prediction using supervised transformer protein language models. Nat Comput Sci, 2022, 2(12): 804–814
CrossRef
Google scholar
|
[122] |
Wang X, Gu R, Chen Z, Li Y, Ji X, Ke G, Wen H (2023e) UNI-RNA: universal pre-trained models revolutionize RNA research. bioRxiv.
|
[123] |
Wang X , Allen WE , Wright MA , Sylwestrak EL , Samusik N , Vesuna S , Evans K , Liu C , Ramakrishnan C , Liu J , Nolan GP , Bava FA , Deisseroth K . Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science, 2018, 361(6400): eaat5691
CrossRef
Google scholar
|
[124] |
Wang X , He Y , Zhang Q , Ren X , Zhang Z . Direct comparative analyses of 10X Genomics Chromium and Smart-seq2. Genomics, Proteomics Bioinformatics, 2021, 19(2): 253–266
CrossRef
Google scholar
|
[125] |
Wen H, Tang W, Dai X, Ding J, Jin W, Xie Y, Tang J (2023) CellPLM: pre-training of cell language model beyond single cells. bioRxiv.
|
[126] |
Wu R, Ding F, Wang R, Shen R, Zhang X, Luo S, Su C, Wu Z, Xie Q, Berger B, Ma J, Peng J (2022) High-resolution de novo structure prediction from primary sequence. bioRxiv. https://doi.org/10.1101/2022.07.21.500999
|
[127] |
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2019) A comprehensive survey on graph neural networks. arXiv. https://doi.org/10.48550/arXiv.1901.00596
|
[128] |
Xu J . Distance-based protein folding powered by deep learning. Proc Natl Acad Sci USA, 2019, 116(34): 16856–16865
CrossRef
Google scholar
|
[129] |
Yang F , Wang W , Wang F , Fang Y , Tang D , Huang J , Lu H , Yao J . scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell, 2022, 4(10): 852–866
CrossRef
Google scholar
|
[130] |
Yang J , Anishchenko I , Park H , Peng Z , Ovchinnikov S , Baker D . Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci USA, 2020, 117(3): 1496–1503
CrossRef
Google scholar
|
[131] |
Ye C , Zhu J , Wang J , Chen D , Meng L , Zhan Y , Yang R , He S , Li Z , Dai S , Li Y , Sun S , Shen Z , Huang Y , Dong R , Chen G , Zheng S . Single-cell and spatial transcriptomics reveal the fibrosis-related immune landscape of biliary atresia. Clin Transl Med, 2022, 12(11): e1070
CrossRef
Google scholar
|
[132] |
Zeira R , Land M , Strzalkowski A , Raphael BJ . Alignment and integration of spatial transcriptomics data. Nat Methods, 2022, 19(5): 567–675
CrossRef
Google scholar
|
[133] |
Zhang M , Eichhorn SW , Zingg B , Yao Z , Cotter K , Zeng H , Dong H , Zhuang X . Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature, 2021, 598(7879): 137–143
CrossRef
Google scholar
|
[134] |
Zhang Y , Lang M , Jiang J , Gao Z , Xu F , Litfin T , Chen K , Singh J , Huang X , Song G , Tian Y , Zhan J , Chen J , Zhou Y . Multiple sequence alignment-based RNA language model and its application to structural inference. Nucleic Acids Res, 2023, 52(1): e3
CrossRef
Google scholar
|
[135] |
Zheng J , Zheng Z , Fu C , Weng Y , He A , Ye X , Gao W , Tian R . Deciphering intercellular signaling complexes by interaction-guided chemical proteomics. Nat Communs, 2023, 14(July): 4138
CrossRef
Google scholar
|
[136] |
Zhou X , Dong K , Zhang S . Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat Comput Sci, 2023a, 3(10): 894–906
CrossRef
Google scholar
|
[137] |
Zhou Z, Ji Y, Li W, Dutta P, Davuluri R, Liu H (2023b) DNABERT-2: efficient foundation model and benchmark for multi-species genome. arXiv.
|
[138] |
Zhu J , Fan Y , Xiong Y , Wang W , Chen J , Xia Y , Lei J , Gong L , Sun S , Jiang T . Delineating the dynamic evolution from preneoplasia to invasive lung adenocarcinoma by integrating single-cell rna sequencing and spatial transcriptomics. Exp Mol Med, 2022, 54(11): 2060–2076
CrossRef
Google scholar
|
[139] |
Zuo C , Zhang Y , Cao C , Feng J , Jiao M , Chen L . Elucidating tumor heterogeneity from spatially resolved transcriptomics data by multi-view graph collaborative learning. Nat Commun, 2022, 13(1): 5962
CrossRef
Google scholar
|
/
〈 |
|
〉 |