Dec 2024, Volume 12 Issue 4
    

  • Select all
  • PERSPECTIVE
    Christina V. Theodoris
    2024, 12(4): 335-338. https://doi.org/10.1002/qub2.68

    Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large‐scale general datasets to pretrain models with foundational knowledge that can then be transferred to improve predictions in a vast range of downstream tasks. More recently, there has been a growth in the adoption of transfer learning approaches in biological fields, where models have been pretrained on massive amounts of biological data and employed to make predictions in a broad range of biological applications. However, unlike in natural language where humans are best suited to evaluate models given a clear understanding of the ground truth, biology presents the unique challenge of being in a setting where there are a plethora of unknowns while at the same time needing to abide by real‐world physical constraints. This perspective provides a discussion of some key points we should consider as a field in designing benchmarks for foundation models in network biology.

  • PERSPECTIVE
    Ziyu Chen, Lin Wei, Ge Gao
    2024, 12(4): 339-344. https://doi.org/10.1002/qub2.69

    Transformer‐based foundation models such as ChatGPTs have revolutionized our daily life and affected many fields including bioinformatics. In this perspective, we first discuss about the direct application of textual foundation models on bioinformatics tasks, focusing on how to make the most out of canonical large language models and mitigate their inherent flaws. Meanwhile, we go through the transformer‐based, bioinformatics‐tailored foundation models for both sequence and non‐sequence data. In particular, we envision the further development directions as well as challenges for bioinformatics foundation models.

  • REVIEW ARTICLE
    Jinge Wang, Zien Cheng, Qiuming Yao, Li Liu, Dong Xu, Gangqing Hu
    2024, 12(4): 345-359. https://doi.org/10.1002/qub2.67

    The year 2023 marked a significant surge in the exploration of applying large language model chatbots, notably Chat Generative Pre‐trained Transformer (ChatGPT), across various disciplines. We surveyed the application of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.

  • RESEARCH ARTICLE
    Muhammad Azam, Yibo Chen, Micheal Olaolu Arowolo, Haowang Liu, Mihail Popescu, Dong Xu
    2024, 12(4): 360-374. https://doi.org/10.1002/qub2.57

    Understanding complex biological pathways, including gene–gene interactions and gene regulatory networks, is critical for exploring disease mechanisms and drug development. Manual literature curation of biological pathways cannot keep up with the exponential growth of new discoveries in the literature. Large‐scale language models (LLMs) trained on extensive text corpora contain rich biological information, and they can be mined as a biological knowledge graph. This study assesses 21 LLMs, including both application programming interface (API)‐based models and open‐source models in their capacities of retrieving biological knowledge. The evaluation focuses on predicting gene regulatory relations (activation, inhibition, and phosphorylation) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway components. Results indicated a significant disparity in model performance. API‐based models GPT‐4 and Claude‐Pro showed superior performance, with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction, and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction, respectively. Open‐source models lagged behind their API‐based counterparts, whereas Falcon‐180b and llama2‐7b had the highest F1 scores of 0.2787 and 0.1923 in gene regulatory relations, respectively. The KEGG pathway recognition had a Jaccard similarity index of 0.2237 for Falcon‐180b and 0.2207 for llama2‐7b. Our study suggests that LLMs are informative in gene network analysis and pathway mapping, but their effectiveness varies, necessitating careful model selection. This work also provides a case study and insight into using LLMs das knowledge graphs. Our code is publicly available at the website of GitHub (Muh‐aza).

  • RESEARCH ARTICLE
    Junjie Tang, Changhu Wang, Feiyi Xiao, Ruibin Xi
    2024, 12(4): 375-388. https://doi.org/10.1002/qub2.64

    Gene regulatory network (GRN) refers to the complex network formed by regulatory interactions between genes in living cells. In this paper, we consider inferring GRNs in single cells based on single‐cell RNA sequencing (scRNA‐seq) data. In scRNA‐seq, single cells are often profiled from mixed populations, and their cell identities are unknown. A common practice for single‐cell GRN analysis is to first cluster the cells and infer GRNs for every cluster separately. However, this two‐step procedure ignores uncertainty in the clustering step and thus could lead to inaccurate estimation of the networks. Here, we consider the mixture Poisson log‐normal model (MPLN) for network inference of count data from mixed populations. The precision matrices of the MPLN are the GRNs of different cell types. To avoid the intractable optimization of the MPLN’s log‐likelihood, we develop an algorithm called variational mixture Poisson log‐normal (VMPLN) to jointly estimate the GRNs of different cell types based on the variational inference method. We compare VMPLN with state‐of‐the‐art single‐cell regulatory network inference methods. Comprehensive simulation shows that VMPLN achieves better performance, especially in scenarios where different cell types have a high mixing degree. Benchmarking on real scRNA‐seq data also demonstrates that VMPLN can provide more accurate network estimation in most cases. Finally, we apply VMPLN to a large scRNA‐seq dataset from patients infected with severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) and find that VMPLN identifies critical differences in regulatory networks in immune cells between patients with moderate and severe symptoms. The source codes are available on the GitHub website (github.com/XiDsLab/SCVMPLN).

  • RESEARCH ARTICLE
    Yunjie Shi, Yun Cheng, Peiyu Chen, Lexiang Zhang, Fangfu Ye
    2024, 12(4): 389-399. https://doi.org/10.1002/qub2.58

    Breast cancer constitutes a significant global health burden, while conventional diagnosis approaches may lack precision and can be discomforting for patients. Exosomes have emerged as promising biomarkers for breast cancer due to their participation in diverse pathological processes, and a convenient analysis platform is believed to greatly promote its application. In this study, we propose a novel digital PCR approach utilizing near‐infrared (NIR) photo‐responsive thermosensitive microcarriers integrated with black phosphorus for quantifying microRNA (miRNA) biomarkers within exosomes. Petal‐like biomimetic nanomaterials were firstly assembled for non‐specific exosome capture based on the affinity effect of avidin and biotin. Photothermal‐responsive microcarriers, fabricated using gelatin‐based substrates blended with photothermal nanocomposite, exhibited NIR‐induced heating and reversible phase transition properties. We optimized synthesis parameters on thermal response and established a programmable and controllable NIR light source module. The results indicated a significant elevation in the levels of biomarkers miRNA‐1246 and miRNA‐122, with fold increases ranging from 6.2 to 23.6 and 5.9 to 13.0, respectively, in breast cancer cell lines MCF‐7 and MDA‐MB‐231 compared to healthy control cells HUVEC. This study offers broad prospects for utilizing exosomes to resolve predictive biomarkers.

  • RESEARCH ARTICLE
    Gabor Kiss, Salissou Moutari, Cara Mctaggart, Lynsey Patterson, Frank Kee, Felicity Lamrock
    2024, 12(4): 400-413. https://doi.org/10.1002/qub2.50

    This study introduces a deterministic formulation for modelling the asymptotic spread of a vaccine preventable disease as well as the different stages for the progression of the disease. We derive the formula for the associated basic reproduction number. To illustrate the proposed model, we use data from the 2017–2018 diphtheria outbreak in Yemen and fit the parameters of the model. A sensitivity analysis of the basic reproduction number, with respect to the model parameters, show that this number increases with an increase of the transmission rate while this number decreases when vaccination rate increases.

  • RESEARCH ARTICLE
    Sarawoot Somin, Don Kulasiri, Sandhya Samarasinghe
    2024, 12(4): 414-432. https://doi.org/10.1002/qub2.61

    The insulin‐degrading enzyme (IDE) plays a significant role in the degradation of the amyloid beta (Aβ), a peptide found in the brain regions of the patients with early Alzheimer’s disease. Adenosine triphosphate (ATP) allosterically regulates the Aβ‐degrading activity of IDE. The present study investigates the electrostatic interactions between ATP‐IDE at the allosteric site of IDE, including thermostabilities/flexibilities of IDE residues, which have not yet been explored systematically. This study applies the quantum mechanics/molecular mechanics (QM/MM) to the proposed computational model for exploring electrostatic interactions between ATP and IDE. Molecular dynamic (MD) simulations are performed at different temperatures for identifying flexible and thermostable residues of IDE. The proposed computational model predicts QM/MM energy‐minimised structures providing the IDE residues (Lys530 and Asp385) with high binding affinities. Considering root mean square fluctuation values during the MD simulations at 300.00 K including heat‐shock temperatures (321.15 K and 315.15 K) indicates that Lys530 and Asp385 are also the thermostable residues of IDE, whereas Ser576 and Lys858 have high flexibilities with compromised thermostabilities. The present study sheds light on the phenomenon of biological recognition and interactions at the ATP‐binding domain, which may have important implications for pharmacological drug design. The proposed computational model may facilitate the development of allosteric IDE activators/inhibitors, which mimic ATP interactions.

  • COMMENTARY
    Minsheng Hao, Lei Wei, Fan Yang, Jianhua Yao, Christina V. Theodoris, Bo Wang, Xin Li, Ge Yang, Xuegong Zhang
    2024, 12(4): 433-443. https://doi.org/10.1002/qub2.65