Determining correlations between molecules at various levels is an important topic in molecular biology. Large language models have demonstrated a remarkable ability to capture correlations from large amounts of data in the field of natural language processing as well as image generation, and correlations captured from data using large language models can also be applicable to solving a wide range of specific tasks, hence large language models are also referred to as foundation models. The massive amount of data that exists in the field of molecular biology provides an excellent basis for the development of foundation models, and the recent emergence of foundation models in the field of molecular biology has really pushed the entire field forward. We summarize the foundation models developed based on RNA sequence data, DNA sequence data, protein sequence data, single-cell transcriptome data, and spatial transcriptome data respectively, and further discuss the research directions for the development of foundation models in molecular biology.
ChatGPT explores a strategic blueprint of question answering (QA) to deliver medical diagnoses, treatment recommendations, and other healthcare support. This is achieved through the increasing incorporation of medical domain data via natural language processing (NLP) and multimodal paradigms. By transitioning the distribution of text, images, videos, and other modalities from the general domain to the medical domain, these techniques have accelerated the progress of medical domain question answering (MDQA). They bridge the gap between human natural language and sophisticated medical domain knowledge or expert-provided manual annotations, handling large-scale, diverse, unbalanced, or even unlabeled data analysis scenarios in medical contexts. Central to our focus is the utilization of language models and multimodal paradigms for medical question answering, aiming to guide the research community in selecting appropriate mechanisms for their specific medical research requirements. Specialized tasks such as unimodal-related question answering, reading comprehension, reasoning, diagnosis, relation extraction, probability modeling, and others, as well as multimodal-related tasks like vision question answering, image captioning, cross-modal retrieval, report summarization, and generation, are discussed in detail. Each section delves into the intricate specifics of the respective method under consideration. This paper highlights the structures and advancements of medical domain explorations against general domain methods, emphasizing their applications across different tasks and datasets. It also outlines current challenges and opportunities for future medical domain research, paving the way for continued innovation and application in this rapidly evolving field. This comprehensive review serves not only as an academic resource but also delineates the course for future probes and utilization in the field of medical question answering.
CX-5461, also known as pidnarulex, is a strong G4 stabilizer and has received FDA fast-track designation for BRCA1- and BRCA2- mutated cancers. However, quantitative measurements of the unfolding rates of CX-5461-G4 complexes which are important for the regulation function of G4s, remain lacking. Here, we employ single-molecule magnetic tweezers to measure the unfolding force distributions of c-MYC G4s in the presence of different concentrations of CX-5461. The unfolding force distributions exhibit three discrete levels of unfolding force peaks, corresponding to three binding modes. In combination with a fluorescent quenching assay and molecular docking to previously reported ligand-c-MYC G4 structure, we assigned the ~69 pN peak corresponding to the 1:1 (ligand:G4) complex where CX-5461 binds at the G4’s 5'-end. The ~84 pN peak is attributed to the 2:1 complex where CX-5461 occupies both the 5' and 3'. Furthermore, using the Bell-Arrhenius model to fit the unfolding force distributions, we determined the zero-force unfolding rates of 1:1, and 2:1 complexes to be (2.4 ± 0.9) × 10−8 s−1 and (1.4 ± 1.0) × 10−9 s−1 respectively. These findings provide valuable insights for the development of G4-targeted ligands to combat c-MYC-driven cancers.