Sequence modeling has been a long-standing research problem in the domain of natural language processing, and advances in the NLP domain have shown that language models trained on huge amounts of unlabeled sequences, especially those based on the Transformer architecture, have a very good ability to model sequences. This success quickly extended to other research domains, and protein science was a pioneer in applying language models. Early protein language models for protein sequence modeling were mainly trained on protein sequence datasets in the form of predicting the content of the masked portion, SeqVec (Heinzinger
et al.
2019) was trained using LSTM-based neural networks, and the analysis showed that the protein representations obtained from SeqVec were able to characterize the stability of proteins very well. TAPE (Rao
et al.
2019) respectively trained language models using the mainstream CNN (Convolutional neural network) (Lecun
et al.
1998), LSTM, and Transformer as the backbone networks, and proved that the Transformer-based language models had better performance compared to other architectures; ESM-1b (Rives
et al.
2021) increases the number of parameters of the model by about 17 times from TAPE-Transformer by widening and deepening the number of network layers, and changes the training set from the Pfam (Mistry
et al.
2021) protein sequence database used for training TAPE-Transformer to the Uniref50 (Mirdita
et al.
2017) protein sequence database, and the analysis results show that ESM-1b The analysis results show that ESM-1b significantly outperforms TAPE-Transformer in the core task of capturing residue interactions (residue contact prediction), and also outperforms TAPE-Transformer in downstream tasks such as protein stability prediction and secondary structure prediction, which makes ESM-1b one of the most widely used protein language models. In ProtTrans's work (Elnaggar
et al.
2022), the effect of language model architecture and sequence database size on the performance of protein language models was investigated by using multiple architectures of language models trained on a variety of different protein sequence databases, and the analysis results showed that the protein language model with the T5-XL (Raffel
et al.
2019) architecture trained on the Uniref50 protein sequence dataset slightly outperforms ESM-1b on downstream tasks, such as secondary structure prediction, protein subcellular localization prediction,
etc. Accurate protein structure prediction is a long-standing challenge in protein science, especially for single-sequence proteins. Given the excellent ability of protein language models to capture residue interactions, trRosettaX-Single (Wang
et al.
2022), RGN2 (Chowdhury
et al.
2022), EMBER2 (Ben-Tal and Kolodny
2022), OmegaFold (Wu
et al.
2022 ), ESMFold (Lin
et al.
2023),
etc. attempted to realize accurate single sequence protein structure prediction with protein language models, these methods not only surpass the traditional "MSA-Contact/Distance-Structure" paradigm in terms of prediction speed, but also have a certain prediction ability for orphan proteins without homologous sequences. In addition to protein structure prediction, LMSuccSite (Pokharel
et al.
2022) applied protein language models to Protein Succinylation Sites Prediction, IDP-LM (Pang and Liu
2023) applied protein language models to protein intrinsic disorder prediction, DeepGOPlus (Kulmanov and Hoehndorf
2020) applied protein language models to protein function prediction, and all achieved favorable results.