Perspectives on benchmarking foundation models for network biology

Christina V. Theodoris

PDF(174 KB)
PDF(174 KB)
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (4) : 335-338. DOI: 10.1002/qub2.68
PERSPECTIVE

Perspectives on benchmarking foundation models for network biology

Author information +
History +

Abstract

Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large‐scale general datasets to pretrain models with foundational knowledge that can then be transferred to improve predictions in a vast range of downstream tasks. More recently, there has been a growth in the adoption of transfer learning approaches in biological fields, where models have been pretrained on massive amounts of biological data and employed to make predictions in a broad range of biological applications. However, unlike in natural language where humans are best suited to evaluate models given a clear understanding of the ground truth, biology presents the unique challenge of being in a setting where there are a plethora of unknowns while at the same time needing to abide by real‐world physical constraints. This perspective provides a discussion of some key points we should consider as a field in designing benchmarks for foundation models in network biology.

Keywords

benchmarking strategy / foundation models / network biology / transfer learning

Cite this article

Download citation ▾
Christina V. Theodoris. Perspectives on benchmarking foundation models for network biology. Quant. Biol., 2024, 12(4): 335‒338 https://doi.org/10.1002/qub2.68

References

[1]
Vaswani A . Attention is all you need. Adv Neural Inf Process Syst. 2017.
[2]
Devlin J , Chang MW , Lee K , Toutanova K . BERT: pre‐training of deep bidirectional transformers for language understanding. In: Naacl HLT 2019-2019 conference of the North American chapter of the association for computational linguistics: human language technologies-proceedings of the conference, 1; 2019. p. 4174- 86.
[3]
He K , Zhang X , Ren S , Sun J . Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition; 2016. p. 770- 8.
[4]
Fudenberg G , Kelley DR , Pollard KS . Predicting 3D genome folding from DNA sequence with Akita. Nat Methods. 2020; 17 (11): 1111- 7.
[5]
Jumper J , Evans R , Pritzel A , Green T , Figurnov M , Ronneberger O , et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596 (7873): 583- 9.
[6]
Avsec Ž , Agarwal V , Visentin D , Ledsam JR , Grabska-Barwinska A , Taylor KR , et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods. 2021; 18 (10): 1196- 203.
[7]
Yang F , Wang W , Wang F , Fang Y , Tang D , Huang J , et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell. 2022; 4 (10): 852- 66.
[8]
Lin Z , Akin H , Rao R , Hie B , Zhu Z , Lu W , et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023: 379.
[9]
Theodoris CV , Xiao L , Chopra A , Chaffin MD , Al Sayed ZR , Hill MC , et al. Transfer learning enables predictions in network biology. Nature. 2023; 618 (7965): 616- 24.
[10]
Shen H , Liu J , Hu J , Shen X , Zhang C , Wu D , et al. Generative pretraining from large-scale transcriptomes for single-cell deciphering. iScience. 2023; 26 (5): 106536.
[11]
Cui H , Wang C , Maan H , Pang K , Luo F , Duan N , et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods. 2024: 1- 11.
[12]
Hao M , Gong J , Zeng X , Liu C , Guo Y , Cheng X , et al. Large-scale foundation model on single-cell transcriptomics. Nat Methods. 2024: 1- 11.
[13]
Nguyen E , Poli M , Faizi M , Thomas A , Birch‐Sykes C , Wornow M , et al. HyenaDNA: long‐range genomic sequence modeling at single nucleotide resolution. 2023. Preprint at arXiv:2306.15794.
[14]
Linder J , Srivastava D , Yuan H , Agarwal V , Kelley DR . Predicting RNA‐seq coverage from DNA sequence as a unifying model of gene regulation. 2023. Preprint at bioRxiv:2023.08.30.555582.
[15]
Yang X , Liu G , Feng G , Bu D , Wang P , Jiang J , et al. GeneCompass: deciphering universal gene regulatory mechanisms with knowledge‐informed cross‐species foundation model. 2023. Preprint at bioRxiv:2023.09.26.559542.
[16]
Rosen Y , Roohani Y , Agarwal A , Samotorčan L , Tabula Sapiens Consortium , Quake SR , et al. Universal cell embeddings: a foundation model for cell biology. 2023. Preprint at bioRxiv:2023.11.28.568918.
[17]
Bian H , Chen Y , Dong X , Li C , Hao M , Chen S , et al. scMulan: a multitask generative pre‐trained language model for single‐cell analysis. 2024. Preprint at bioRxiv:2024.01.25.577152.
[18]
Nguyen E , Poli M , Durrant MG , Thomas AW , Kang B , Sullivan J , et al. Sequence modeling and design from molecular to genome scale with Evo. 2024. Preprint at bioRxiv:2024.02.27.582234.
[19]
Schaar AC , Tejada‐Lapuerta A , Palla G , Gutgesell R , Halle L , Minaeva M , et al. Nicheformer: a foundation model for single‐cell and spatial omics. 2024. Preprint at bioRxiv:2024.04.15.589472.
[20]
Liu T , Li K , Wang Y , Li H , Zhao H . Evaluating the utilities of foundation models in single‐cell data analysis. 2023. Preprint at bioRxiv:2023.09.08.555192.
[21]
Wei J , Tay Y , Bommasani R , Raffel C , Zoph B , Borgeaud S , et al. Emergent abilities of large language models. 2022. Preprint at arXiv:2206.07682.

RIGHTS & PERMISSIONS

2024 2024 The Author(s). Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.
AI Summary AI Mindmap
PDF(174 KB)

Accesses

Citations

Detail

Sections
Recommended

/