
Perspectives on benchmarking foundation models for network biology
Christina V. Theodoris
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (4) : 335-338.
Perspectives on benchmarking foundation models for network biology
Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large‐scale general datasets to pretrain models with foundational knowledge that can then be transferred to improve predictions in a vast range of downstream tasks. More recently, there has been a growth in the adoption of transfer learning approaches in biological fields, where models have been pretrained on massive amounts of biological data and employed to make predictions in a broad range of biological applications. However, unlike in natural language where humans are best suited to evaluate models given a clear understanding of the ground truth, biology presents the unique challenge of being in a setting where there are a plethora of unknowns while at the same time needing to abide by real‐world physical constraints. This perspective provides a discussion of some key points we should consider as a field in designing benchmarks for foundation models in network biology.
benchmarking strategy / foundation models / network biology / transfer learning
[1] |
Vaswani A . Attention is all you need. Adv Neural Inf Process Syst. 2017.
|
[2] |
Devlin J , Chang MW , Lee K , Toutanova K . BERT: pre‐training of deep bidirectional transformers for language understanding. In: Naacl HLT 2019-2019 conference of the North American chapter of the association for computational linguistics: human language technologies-proceedings of the conference, 1; 2019. p. 4174- 86.
|
[3] |
He K , Zhang X , Ren S , Sun J . Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition; 2016. p. 770- 8.
CrossRef
Google scholar
|
[4] |
Fudenberg G , Kelley DR , Pollard KS . Predicting 3D genome folding from DNA sequence with Akita. Nat Methods. 2020; 17 (11): 1111- 7.
CrossRef
Google scholar
|
[5] |
Jumper J , Evans R , Pritzel A , Green T , Figurnov M , Ronneberger O , et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596 (7873): 583- 9.
CrossRef
Google scholar
|
[6] |
Avsec Ž , Agarwal V , Visentin D , Ledsam JR , Grabska-Barwinska A , Taylor KR , et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods. 2021; 18 (10): 1196- 203.
CrossRef
Google scholar
|
[7] |
Yang F , Wang W , Wang F , Fang Y , Tang D , Huang J , et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell. 2022; 4 (10): 852- 66.
CrossRef
Google scholar
|
[8] |
Lin Z , Akin H , Rao R , Hie B , Zhu Z , Lu W , et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023: 379.
CrossRef
Google scholar
|
[9] |
Theodoris CV , Xiao L , Chopra A , Chaffin MD , Al Sayed ZR , Hill MC , et al. Transfer learning enables predictions in network biology. Nature. 2023; 618 (7965): 616- 24.
CrossRef
Google scholar
|
[10] |
Shen H , Liu J , Hu J , Shen X , Zhang C , Wu D , et al. Generative pretraining from large-scale transcriptomes for single-cell deciphering. iScience. 2023; 26 (5): 106536.
CrossRef
Google scholar
|
[11] |
Cui H , Wang C , Maan H , Pang K , Luo F , Duan N , et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods. 2024: 1- 11.
CrossRef
Google scholar
|
[12] |
Hao M , Gong J , Zeng X , Liu C , Guo Y , Cheng X , et al. Large-scale foundation model on single-cell transcriptomics. Nat Methods. 2024: 1- 11.
CrossRef
Google scholar
|
[13] |
Nguyen E , Poli M , Faizi M , Thomas A , Birch‐Sykes C , Wornow M , et al. HyenaDNA: long‐range genomic sequence modeling at single nucleotide resolution. 2023. Preprint at arXiv:2306.15794.
|
[14] |
Linder J , Srivastava D , Yuan H , Agarwal V , Kelley DR . Predicting RNA‐seq coverage from DNA sequence as a unifying model of gene regulation. 2023. Preprint at bioRxiv:2023.08.30.555582.
CrossRef
Google scholar
|
[15] |
Yang X , Liu G , Feng G , Bu D , Wang P , Jiang J , et al. GeneCompass: deciphering universal gene regulatory mechanisms with knowledge‐informed cross‐species foundation model. 2023. Preprint at bioRxiv:2023.09.26.559542.
CrossRef
Google scholar
|
[16] |
Rosen Y , Roohani Y , Agarwal A , Samotorčan L , Tabula Sapiens Consortium , Quake SR , et al. Universal cell embeddings: a foundation model for cell biology. 2023. Preprint at bioRxiv:2023.11.28.568918.
CrossRef
Google scholar
|
[17] |
Bian H , Chen Y , Dong X , Li C , Hao M , Chen S , et al. scMulan: a multitask generative pre‐trained language model for single‐cell analysis. 2024. Preprint at bioRxiv:2024.01.25.577152.
|
[18] |
Nguyen E , Poli M , Durrant MG , Thomas AW , Kang B , Sullivan J , et al. Sequence modeling and design from molecular to genome scale with Evo. 2024. Preprint at bioRxiv:2024.02.27.582234.
CrossRef
Google scholar
|
[19] |
Schaar AC , Tejada‐Lapuerta A , Palla G , Gutgesell R , Halle L , Minaeva M , et al. Nicheformer: a foundation model for single‐cell and spatial omics. 2024. Preprint at bioRxiv:2024.04.15.589472.
CrossRef
Google scholar
|
[20] |
Liu T , Li K , Wang Y , Li H , Zhao H . Evaluating the utilities of foundation models in single‐cell data analysis. 2023. Preprint at bioRxiv:2023.09.08.555192.
CrossRef
Google scholar
|
[21] |
Wei J , Tay Y , Bommasani R , Raffel C , Zoph B , Borgeaud S , et al. Emergent abilities of large language models. 2022. Preprint at arXiv:2206.07682.
|
/
〈 |
|
〉 |