Journal home Browse Most accessed

Most accessed

  • Select all
    Evelyn Shue, Li Liu, Bingxin Li, Zifeng Feng, Xin Li, Gangqing Hu
    Quantitative Biology, 2023, 11(2): 105-108.

    The impressive conversational and programming abilities of ChatGPT make it an attractive tool for facilitating the education of bioinformatics data analysis for beginners. In this study, we proposed an iterative model to fine-tune instructions for guiding a chatbot in generating code for bioinformatics data analysis tasks. We demonstrated the feasibility of the model by applying it to various bioinformatics topics. Additionally, we discussed practical considerations and limitations regarding the use of the model in chatbot-aided bioinformatics education.

    Dong Xu
    Quantitative Biology, 2023, 11(2): 204-206.
    Junting Wang, Huan Tao, Hao Li, Xiaochen Bo, Hebing Chen
    Quantitative Biology, 2023, 11(2): 109-121.

    Background: The hierarchical three-dimensional (3D) architectures of chromatin play an important role in fundamental biological processes, such as cell differentiation, cellular senescence, and transcriptional regulation. Aberrant chromatin 3D structural alterations often present in human diseases and even cancers, but their underlying mechanisms remain unclear.

    Results: 3D chromatin structures (chromatin compartment A/B, topologically associated domains, and enhancer-promoter interactions) play key roles in cancer development, metastasis, and drug resistance. Bioinformatics techniques based on machine learning and deep learning have shown great potential in the study of 3D cancer genome.

    Conclusion: Current advances in the study of the 3D cancer genome have expanded our understanding of the mechanisms underlying tumorigenesis and development. It will provide new insights into precise diagnosis and personalized treatment for cancers.

    Qijin Yin, Rui Fan, Xusheng Cao, Qiao Liu, Rui Jiang, Wanwen Zeng
    Quantitative Biology, 2023, 11(3): 260-274.

    Background: Computational approaches for accurate prediction of drug interactions, such as drug-drug interactions (DDIs) and drug-target interactions (DTIs), are highly demanded for biochemical researchers. Despite the fact that many methods have been proposed and developed to predict DDIs and DTIs respectively, their success is still limited due to a lack of systematic evaluation of the intrinsic properties embedded in the corresponding chemical structure.

    Methods: In this paper, we develop DeepDrug, a deep learning framework for overcoming the above limitation by using residual graph convolutional networks (Res-GCNs) and convolutional networks (CNNs) to learn the comprehensive structure- and sequence-based representations of drugs and proteins.

    Results: DeepDrug outperforms state-of-the-art methods in a series of systematic experiments, including binary-class DDIs, multi-class/multi-label DDIs, binary-class DTIs classification and DTIs regression tasks. Furthermore, we visualize the structural features learned by DeepDrug Res-GCN module, which displays compatible and accordant patterns in chemical properties and drug categories, providing additional evidence to support the strong predictive power of DeepDrug. Ultimately, we apply DeepDrug to perform drug repositioning on the whole DrugBank database to discover the potential drug candidates against SARS-CoV-2, where 7 out of 10 top-ranked drugs are reported to be repurposed to potentially treat coronavirus disease 2019 (COVID-19).

    Conclusions: To sum up, we believe that DeepDrug is an efficient tool in accurate prediction of DDIs and DTIs and provides a promising insight in understanding the underlying mechanism of these biochemical relations.

    Xuegong Zhang, Lei Wei, Rui Jiang, Xiaowo Wang, Jin Gu, Zhen Xie, Hairong Lv
    Quantitative Biology, 2023, 11(3): 207-213.

    The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose the concept and framework of Digital Life Systems or dLife as a new paradigm to unleash this potential. It includes the multi-scale and multi-granule measure and representation of life in the digital space, the mathematical and/or computational modeling of the biology behind physiological and pathological processes, and ultimately cyber twins of healthy or diseased human body in the virtual space that can be used to simulate complex biological processes and deduce effects of medical treatments. We advocate that dLife is the route toward future AI precision medicine and should be the new paradigm for future biological and medical research.

    Haiyan Gong, Zhengyuan Chen, Yuxin Tang, Minghong Li, Sichen Zhang, Xiaotong Zhang, Yang Chen
    Quantitative Biology, 2023, 11(2): 122-142.

    Background: As parts of the cis-regulatory mechanism of the human genome, interactions between distal enhancers and proximal promoters play a crucial role. Enhancers, promoters, and enhancer-promoter interactions (EPIs) can be detected using many sequencing technologies and computation models. However, a systematic review that summarizes these EPI identification methods and that can help researchers apply and optimize them is still needed.

    Results: In this review, we first emphasize the role of EPIs in regulating gene expression and describe a generic framework for predicting enhancer-promoter interaction. Next, we review prediction methods for enhancers, promoters, loops, and enhancer-promoter interactions using different data features that have emerged since 2010, and we summarize the websites available for obtaining enhancers, promoters, and enhancer-promoter interaction datasets. Finally, we review the application of the methods for identifying EPIs in diseases such as cancer.

    Conclusions: The advance of computer technology has allowed traditional machine learning, and deep learning methods to be used to predict enhancer, promoter, and EPIs from genetic, genomic, and epigenomic features. In the past decade, models based on deep learning, especially transfer learning, have been proposed for directly predicting enhancer-promoter interactions from DNA sequences, and these models can reduce the parameter training time required of bioinformatics researchers. We believe this review can provide detailed research frameworks for researchers who are beginning to study enhancers, promoters, and their interactions.

    Yuanpeng Xiong, Xuan He, Dan Zhao, Tao Jiang, Jianyang Zeng
    Quantitative Biology, 2023, 11(3): 275-286.

    Background: Chromatin-associated RNA (caRNA) acts as a ubiquitous epigenetic layer in eukaryotes, and has been reported to be essential in various biological processes, including gene transcription, chromatin remodeling and cellular differentiation. Recently, numerous experimental techniques have been developed to characterize genome-wide RNA-chromatin interactions to understand their underlying biological functions. However, these experimental methods are generally expensive, time-consuming, and limited in identifying all potential sites, while most of the existing computational methods are restricted to detecting only specific types of RNAs interacting with chromatin.

    Methods: Here, we propose a highly interpretable computational framework, named DeepRCI, to identify the interactions between various types of RNAs and chromatin. In this framework, we introduce a novel deep learning component called variformer and integrate multi-omics data to capture intrinsic genomic features at both RNA and DNA levels.

    Results: Extensive experiments demonstrate that DeepRCI can detect RNA-chromatin interactions more accurately when compared to the state-of-the-art baseline prediction methods. Furthermore, the sequence features extracted by DeepRCI can be well matched to known critical gene regulatory components, indicating that our model can provide useful biological insights into understanding the underlying mechanisms of RNA-chromatin interactions. In addition, based on the prediction results, we further delineate the relationships between RNA-chromatin interactions and cellular functions, including gene expression and the modulation of cell states.

    Conclusions: In summary, DeepRCI can serve as a useful tool for characterizing RNA-chromatin interactions and studying the underlying gene regulatory code.

    Xabier Martinez-de-Morentin, Sumeer A. Khan, Robert Lehmann, Sisi Qu, Alberto Maillo, Narsis A. Kiani, Felipe Prosper, Jesper Tegner, David Gomez-Cabrero
    Quantitative Biology, 2023, 11(3): 246-259.

    Background: Single-cell multi-omics technologies allow a profound system-level biology understanding of cells and tissues. However, an integrative and possibly systems-based analysis capturing the different modalities is challenging. In response, bioinformatics and machine learning methodologies are being developed for multi-omics single-cell analysis. It is unclear whether current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter fine-tuning.

    Methods: We designed LIBRA, a neural network based framework, to learn translation between paired multi-omics profiles so that a shared latent space is constructed. Additionally, we implemented a variation, aLIBRA, that allows automatic fine-tuning by identifying parameter combinations that optimize both the integrative and predictive tasks. All model parameters and evaluation metrics are made available to users with minimal user iteration. Furthermore, aLIBRA allows experienced users to implement custom configurations. The LIBRA toolbox is freely available as R and Python libraries at GitHub (TranslationalBioinformaticsUnit/LIBRA).

    Results: LIBRA was evaluated in eight multi-omic single-cell data-sets, including three combinations of omics. We observed that LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type (clustering) resolution in the integrated latent space. Furthermore, when assessing the predictive power across data modalities, such as predictive chromatin accessibility from gene expression, LIBRA outperforms existing tools. As expected, adaptive parameter optimization (aLIBRA) significantly boosted the performance of learning predictive models from paired data-sets.

    Conclusion: LIBRA is a versatile tool that performs competitively in both “integration” and “prediction” tasks based on single-cell multi-omics data. LIBRA is a data-driven robust platform that includes an adaptive learning scheme.

    Yan Yan, Liheng Yang, Leyuan Meng, Haochen Su, Cheng Zhou, Le Yu, Zhengtu Li, Xu Zhang, Huihua Cai, Juntao Gao
    Quantitative Biology, 2023, 11(3): 231-245.

    Background: Spatial multi-omics are demonstrated to be a powerful method to assist researchers on genetic studies. In this review, bioimaging-based spatial multi-omics techniques such as seqFISH+, merFISH, integrated DNA seqFISH+, DNA merFISH, and MINA are introduced along with each technique’s probe design, development, and imaging processes.

    Results: seqFISH employed 4–5 fluorophores to barcode and conducted multiple rounds of hybridization, in order that mRNA can be identified through color-coding. seqFISH+ added 60 pseudo-color and distributed them equally into three channels to enhance imaging power, in order that i.e., 24,000 genes can be imaged in total. merFISH utilized 4 out 16 Hamming distance to innovatively provide a robust error-detecting method. MINA, a methodology combining merFISH (multiplexed error-robust fluorescence in situ hybridization) and chromosomal tracing, enabled multiplexed genomic architecture imaged in mammalian single cells. Optical reconstruction of chromatin architecture (ORCA) a method that could conduct DNA path tracing in nanoscale manner with kilobase resolution, an FISH variation that improved genetic resolution, enable high-precision fiducial registration and sequential imaging, and utilized Oligopaint probe to hybridize the short genomic region ranging from 2 to 10 kilobase. ORCA then prescribes these short section primary probes with individual barcodes to attach fluorophore and to be imaged.

    Conclusion: This review concentrated on providing a comprehensive overview for these spatial-multi-omics techniques with the intention on helping researchers on selecting appropriate technique for their research.

    Chenfei Tian, Jianhua Li, Yong Wang
    Quantitative Biology, 2023, 11(3): 214-230.

    Backgrounds: As an increasing number of synthetic switches and circuits have been created for plant systems and of synthetic products produced in plant chassis, plant synthetic biology is taking a strong foothold in agriculture and medicine. The ever-exploding data has also promoted the expansion of toolkits in this field. Genetic parts libraries and quantitative characterization approaches have been developed. However, plant synthetic biology is still in its infancy. The considerations for selecting biological parts to design and construct genetic circuits with predictable functions remain desired.

    Results: In this article, we review the current biotechnological progresses in field of plant synthetic biology. Assembly standardization and quantitative approaches of genetic parts and genetic circuits are discussed. We also highlight the main challenges in the iterative cycles of design-build-test-learn for introducing novel traits into plants.

    Conclusion: Plant synthetic biology promises to provide important solutions to many issues in agricultural production, human health care, and environmental sustainability. However, tremendous challenges exist in this field. For example, the quantitative characterization of genetic parts is limited; the orthogonality and the transfer functions of circuits are unpredictable; and also, the mathematical modeling-assisted circuits design still needs to improve predictability and reliability. These challenges are expected to be resolved in the near future as interests in this field are intensifying.

    Mateusz Chiliński, Anup Kumar Halder, Dariusz Plewczynski
    Quantitative Biology, 2023, 11(2): 155-162.

    Background: With the development of rapid and cheap sequencing techniques, the cost of whole-genome sequencing (WGS) has dropped significantly. However, the complexity of the human genome is not limited to the pure sequence—and additional experiments are required to learn the human genome’s influence on complex traits. One of the most exciting aspects for scientists nowadays is the spatial organisation of the genome, which can be discovered using spatial experiments (e.g., Hi-C, ChIA-PET). The information about the spatial contacts helps in the analysis and brings new insights into our understanding of the disease developments.

    Methods: We have used an ensemble of deep learning with classical machine learning algorithms. The deep learning network we used was DNABERT, which utilises the BERT language model (based on transformers) for the genomic function. The classical machine learning models included support vector machines (SVMs), random forests (RFs), and K-nearest neighbor (KNN). The whole approach was wrapped together as deep hybrid learning (DHL).

    Results: We found that the DNABERT can be used to predict the ChIA-PET experiments with high precision. Additionally, the DHL approach has increased the metrics on CTCF and RNAPII sets.

    Conclusions: DHL approach should be taken into consideration for the models utilising the power of deep learning. While straightforward in the concept, it can improve the results significantly.

    Yuwei Huang, Huidan Chang, Xiaoyi Chen, Jiayue Meng, Mengyao Han, Tao Huang, Liyun Yuan, Guoqing Zhang
    Quantitative Biology, 2023, 11(2): 163-174.

    Background: The precise and efficient analysis of single-cell transcriptome data provides powerful support for studying the diversity of cell functions at the single-cell level. The most important and challenging steps are cell clustering and recognition of cell populations. While the precision of clustering and annotation are considered separately in most current studies, it is worth attempting to develop an extensive and flexible strategy to balance clustering accuracy and biological explanation comprehensively.

    Methods: The cell marker-based clustering strategy (cmCluster), which is a modified Louvain clustering method, aims to search the optimal clusters through genetic algorithm (GA) and grid search based on the cell type annotation results.

    Results: By applying cmCluster on a set of single-cell transcriptome data, the results showed that it was beneficial for the recognition of cell populations and explanation of biological function even on the occasion of incomplete cell type information or multiple data resources. In addition, cmCluster also produced clear boundaries and appropriate subtypes with potential marker genes. The relevant code is available in GitHub website (huangyuwei301/cmCluster).

    Conclusions: We speculate that cmCluster provides researchers effective screening strategies to improve the accuracy of subsequent biological analysis, reduce artificial bias, and facilitate the comparison and analysis of multiple studies.

    Xiuquan Wang, Mian Umair Ahsan, Yunyun Zhou, Kai Wang
    Quantitative Biology, 2023, 11(3): 287-296.

    Background: Oxford Nanopore long-read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short-read bisulfite sequencing or methylation microarrays. A number of analytical tools, such as Nanopolish, Guppy/Tombo and DeepMod, have been developed to detect DNA methylation on Nanopore data. However, additional improvements can be made in computational efficiency, prediction accuracy, and contextual interpretation on complex genomics regions (such as repetitive regions, low GC density regions).

    Method: In the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. Transformer is an algorithm that adopts self-attention architecture in the neural networks and has been widely used in natural language processing.

    Results: Compared to traditional deep-learning method such as convolutional neural network (CNN) and recurrent neural network (RNN), Transformer may have specific advantages in DNA methylation detection, because the self-attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation-specific signals within a specific sequence context.

    Conclusion: We demonstrated the ability of Transformers to detect methylation on ionic signal data.

    Kh Shahriya Zaman, Md Mamun Bin Ibne Reaz
    Quantitative Biology, 2023, 11(2): 175-182.

    Background: Machine learning has enabled the automatic detection of facial expressions, which is particularly beneficial in smart monitoring and understanding the mental state of medical and psychological patients. Most algorithms that attain high emotion classification accuracy require extensive computational resources, which either require bulky and inefficient devices or require the sensor data to be processed on cloud servers. However, there is always the risk of privacy invasion, data misuse, and data manipulation when the raw images are transferred to cloud servers for processing facical emotion recognition (FER) data. One possible solution to this problem is to minimize the movement of such private data.

    Methods: In this research, we propose an efficient implementation of a convolutional neural network (CNN) based algorithm for on-device FER on a low-power field programmable gate array (FPGA) platform. This is done by encoding the CNN weights to approximated signed digits, which reduces the number of partial sums to be computed for multiply-accumulate (MAC) operations. This is advantageous for portable devices that lack full-fledged resource-intensive multipliers.

    Results: We applied our approximation method on MobileNet-v2 and ResNet18 models, which were pretrained with the FER2013 dataset. Our implementations and simulations reduce the FPGA resource requirement by at least 22% compared to models with integer weight, with negligible loss in classification accuracy.

    Conclusions: The outcome of this research will help in the development of secure and low-power systems for FER and other biomedical applications. The approximation methods used in this research can also be extended to other image-based biomedical research fields.

    Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu, Weilin Zhang
    Quantitative Biology, 2023, 11(3): 320-331.

    Background: Molecular docking-based virtual screening (VS) aims to choose ligands with potential pharmacological activities from millions or even billions of molecules. This process could significantly cut down the number of compounds that need to be experimentally tested. However, during the docking calculation, many molecules have low affinity for a particular protein target, which waste a lot of computational resources.

    Methods: We implemented a fast and practical molecular screening approach called DL-DockVS (deep learning dock virtual screening) by using deep learning models (regression and classification models) to learn the outcomes of pipelined docking programs step-by-step.

    Results: In this study, we showed that this approach could successfully weed out compounds with poor docking scores while keeping compounds with potentially high docking scores against 10 DUD-E protein targets. A self-built dataset of about 1.9 million molecules was used to further verify DL-DockVS, yielding good results in terms of recall rate, active compounds enrichment factor and runtime speed.

    Conclusions: We comprehensively evaluate the practicality and effectiveness of DL-DockVS against 10 protein targets. Due to the improvements of runtime and maintained success rate, it would be a useful and promising approach to screen ultra-large compound libraries in the age of big data. It is also very convenient for researchers to make a well-trained model of one specific target for predicting other chemical libraries and high docking-score molecules without docking computation again.

    Huawei Zhu, Yin Li
    Quantitative Biology, 2023, 11(2): 143-154.

    Background: Light-driven synthetic microbial consortia are composed of photoautotrophs and heterotrophs. They exhibited better performance in stability, robustness and capacity for handling complex tasks when comparing with axenic cultures. Different from general microbial consortia, the intrinsic property of photosynthetic oxygen evolution in light-driven synthetic microbial consortia is an important factor affecting the functions of the consortia.

    Results: In light-driven microbial consortia, the oxygen liberated by photoautotrophs will result in an aerobic environment, which exerts dual effects on different species and processes. On one hand, oxygen is favorable to the synthetic microbial consortia when they are used for wastewater treatment and aerobic chemical production, in which biomass accumulation and oxidized product formation will benefit from the high energy yield of aerobic respiration. On the other hand, the oxygen is harmful to the synthetic microbial consortia when they were used for anaerobic processes including biohydrogen production and bioelectricity generation, in which the presence of oxygen will deactivate some biological components and compete for electrons.

    Conclusions: Developing anaerobic processes in using light-driven synthetic microbial consortia represents a cost-effective alternative for production of chemicals from carbon dioxide and light. Thus, exploring a versatile approach addressing the oxygen dilemma is essential to enable light-driven synthetic microbial consortia to get closer to practical applications.

    Ali Tarihi, Mojtaba Tarihi, Taki Tiraihi
    Quantitative Biology, 2023, 11(2): 183-203.

    Background: Morphogenesis is a complex process in a developing animal at the organ, cellular and molecular levels. In this investigation, allometry at the cellular level was evaluated.

    Methods: Geometric information, including the time-lapse Cartesian coordinates of each cell’s center, was used for calculating the allometric coefficients. A zero-centroaxial skew-symmetrical matrix (CSSM), was generated and used for constructing another square matrix (basic square matrix: BSM), then the determinant of BSM was calculated (d). The logarithms of absolute d (Lad) of cell group at different stages of development were plotted for all of the cells in a range of development stages; the slope of the regression line was estimated then used as the allometric coefficient. Moreover, the lineage growth rate (LGR) was also calculated by plotting the Lad against the logarithm of the time. The complexity index at each stage was calculated. The method was tested on a developing Caenorhabditis elegans embryo.

    Results: We explored two out of the four first generated blastomeres in C. elegans embryo. The ABp and EMS lineages show that the allometric coefficient of ABp was higher than that of EMS, which was consistent with the complexity index as well as LGR.

    Conclusion: The conclusion of this study is that the complexity of the differentiating cells in a developing embryo can be evaluated by allometric scaling based on the data derived from the Cartesian coordinates of the cells at different stages of development.

    Elham Dalalbashi Esfahani, Esmaeil Ebrahimie, Ali Niazi, Manijeh Mohammadi Dehcheshmeh
    Quantitative Biology, 2023, 11(3): 343-358.

    Background: Accumulating evidence shows that long non-coding RNAs (lncRNAs) play critical roles in cancer progression. The possible association between lncRNAs and herbal medicine is yet to be known. This study aims to identify medicinal herbs associated with lncRNAs by RNA-seq data for breast and prostate cancer.

    Methods: To develop the optimal approach for identifying cancer-related lncRNAs, we implemented two steps: (1) applying protein–protein interaction (PPI), Gene Ontology (GO), and pathway analyses, and (2) applying attribute weighting and finding the efficient classification model of the machine learning approach.

    Results: In the first step, GO terms and pathway analyses on differential co-expressed mRNAs revealed that lncRNAs were widely co-expressed with metabolic process genes. We identified two hub lncRNA-mRNA networks that implicate lncRNAs associated with breast and prostate cancer. In the second step, we implemented various machine learning-based prediction systems (Decision Tree, Random Forest, Deep Learning, and Gradient-Boosted Tree) on the non-transformed and Z-standardized differential co-expressed lncRNAs. Based on five-fold cross-validation, we obtained high accuracy (91.11%), high sensitivity (88.33%), and high specificity (93.33%) in Deep Learning which reinforces the biomarker power of identified lncRNAs in this study. As data originally came from different cell lines at different durations of herbal treatment intervention, we applied seven attribute weighting algorithms to check the effects of variables on identifying lncRNAs. Attribute weighting results showed that the cell line and time had little or no effect on the selected lncRNAs list. Besides, we identified one known lncRNAs, downregulated RNA in cancer (DRAIC), as an essential feature.

    Conclusions: This study will provide further insights to investigate the potential therapeutic and prognostic targets for prostate cancer (PC) and breast cancer (BC) in common.

    Nan Miles Xi, Angelos Vasilopoulos
    Quantitative Biology, 2023, 11(3): 297-305.

    Background: The existence of doublets in single-cell RNA sequencing (scRNA-seq) data poses a great challenge in downstream data analysis. Computational doublet-detection methods have been developed to remove doublets from scRNA-seq data. Yet, the default hyperparameter settings of those methods may not provide optimal performance.

    Methods: We propose a strategy to tune hyperparameters for a cutting-edge doublet-detection method. We utilize a full factorial design to explore the relationship between hyperparameters and detection accuracy on 16 real scRNA-seq datasets. The optimal hyperparameters are obtained by a response surface model and convex optimization.

    Results: We show that the optimal hyperparameters provide top performance across scRNA-seq datasets under various biological conditions. Our tuning strategy can be applied to other computational doublet-detection methods. It also offers insights into hyperparameter tuning for broader computational methods in scRNA-seq data analysis.

    Conclusions: The hyperparameter configuration significantly impacts the performance of computational doublet-detection methods. Our study is the first attempt to systematically explore the optimal hyperparameters under various biological conditions and optimization objectives. Our study provides much-needed guidance for hyperparameter tuning in computational doublet-detection methods.

    Doyoung Park
    Quantitative Biology, 2023, 11(3): 306-319.

    Background: Living cells need to undergo subtle shape adaptations in response to the topography of their substrates. These shape changes are mainly determined by reorganization of their internal cytoskeleton, with a major contribution from filamentous (F) actin. Bundles of F-actin play a major role in determining cell shape and their interaction with substrates, either as “stress fibers,” or as our newly discovered “Concave Actin Bundles” (CABs), which mainly occur while endothelial cells wrap micro-fibers in culture.

    Methods: To better understand the morphology and functions of these CABs, it is necessary to recognize and analyze as many of them as possible in complex cellular ensembles, which is a demanding and time-consuming task. In this study, we present a novel algorithm to automatically recognize CABs without further human intervention. We developed and employed a multilayer perceptron artificial neural network (“the recognizer”), which was trained to identify CABs.

    Results: The recognizer demonstrated high overall recognition rate and reliability in both randomized training, and in subsequent testing experiments.

    Conclusion: It would be an effective replacement for validation by visual detection which is both tedious and inherently prone to errors.

    Leandro R. Jones, Julieta M. Manrique
    Quantitative Biology, 2023, 11(3): 332-342.

    Background: Massively parallel sequencing of environmental DNA allows microbiological studies to be performed in greater detail than was possible with first-generation sequencing. For example, it facilitates the use of approaches hitherto largely applied to flora and fauna, such as rank abundance distribution (RAD) analyses.

    Methods: Here, we set out to advance the knowledge on Ca. Pelagibacterales (SAR11) communities from southern South America using environmental sequences from the open ocean in the Argentine sea, the uncharted Engaño Bay, as well as a river and an oligohaline shallow lake from the Patagonian Steppe ecoregion. The structures of the SAR11 assemblages present in these ecosystems were dissected by direct and rarefaction-based estimates of species richness, and evaluations of the corresponding abundance distributions (ADs), which was addressed by RAD analyses.

    Results: Microbial community composition analyses revealed that the studied SAR11 assemblages coexist with 27 bacterial phyla. SAR11 richness was in general very high, but ADs turned out to be highly uneven. The results were compatible with prior knowledge, and similar to that derived from point estimates of diversity. However, our comprehensive dissection allowed for more detailed quantitative comparisons to be made between the environments surveyed, and revealed differences regarding both richness and the underlying ADs.

    Conclusions: Despite SAR11 assemblages being extremely rich, their ADs are very uneven. Richness and ADs can vary, not only between fresh and salt water, but also between oceanic and coastal marine environments. The obtained results provide insights on general topics such as adaptation and the contrast between marine and freshwater radiations.

    Chao Pang, Henry H. Y. Tong, Leyi Wei
    Quantitative Biology, 2023, 11(4): 395-404.

    The prediction of molecular properties is a crucial task in the field of drug discovery. Computational methods that can accurately predict molecular properties can significantly accelerate the drug discovery process and reduce the cost of drug discovery. In recent years, iterative updates in computing hardware and the rise of deep learning have created a new and effective path for molecular property prediction. Deep learning methods can leverage the vast amount of data accumulated over the years in drug discovery and do not require complex feature engineering. In this review, we summarize molecular representations and commonly used datasets in molecular property prediction models and present advanced deep learning methods for molecular property prediction, including state-of-the-art deep learning networks such as graph neural networks and Transformer-based models, as well as state-of-the-art deep learning strategies such as 3D pre-train, contrastive learning, multi-task learning, transfer learning, and meta-learning. We also point out some critical issues such as lack of datasets, low information utilization, and lack of specificity for diseases.

    Ke Feng, Hongyang Jiang, Chaoyi Yin, Huiyan Sun
    Quantitative Biology, 2023, 11(4): 434-450.

    Gene regulatory network (GRN) inference from gene expression data is a significant approach to understanding aspects of the biological system. Compared with generalized correlation-based methods, causality-inspired ones seem more rational to infer regulatory relationships. We propose GRINCD, a novel GRN inference framework empowered by graph representation learning and causal asymmetric learning, considering both linear and non-linear regulatory relationships. First, high-quality representation of each gene is generated using graph neural network. Then, we apply the additive noise model to predict the causal regulation of each regulator-target pair. Additionally, we design two channels and finally assemble them for robust prediction. Through comprehensive comparisons of our framework with state-of-the-art methods based on different principles on numerous datasets of diverse types and scales, the experimental results show that our framework achieves superior or comparable performance under various evaluation metrics. Our work provides a new clue for constructing GRNs, and our proposed framework GRINCD also shows potential in identifying key factors affecting cancer development.

    Hongfei Cui
    Quantitative Biology, 2023, 11(4): 451-470.

    The information on host–microbe interactions contained in the operational taxonomic unit (OTU) abundance table can serve as a clue to understanding the biological traits of OTUs and samples. Some studies have inferred the taxonomies or functions of OTUs by constructing co-occurrence networks, but co-occurrence networks can only encompass a small fraction of all OTUs due to the high sparsity of the OTU table. There is a lack of studies that intensively explore and use the information on sample-OTU interactions. This study constructed a sample-OTU heterogeneous information network and represented the nodes in the network through the heterogeneous graph embedding method to form the OTU space and sample space. Taking advantage of the represented OTU and sample vectors combined with the original OTU abundance information, an Integrated Model of Embedded Taxonomies and Abundance (IMETA) was proposed for predicting sample attributes, such as phenotypes and individual diet habits. Both the OTU space and sample space contain reasonable biological or medical semantic information, and the IMETA using embedded OTU and sample vectors can have stable and good performance in the sample classification tasks. This suggests that the embedding representation based on the sample-OTU heterogeneous information network can provide more useful information for understanding microbiome samples. This study conducted quantified representations of the biological characteristics within the OTUs and samples, which is a good attempt to increase the utilization rate of information in the OTU abundance table, and it promotes a deeper understanding of the underlying knowledge of human microbiome.

    Junqi Zhang, Zixuan You, Dingyuan Liu, Rui Tang, Chao Zhao, Yingxiu Cao, Feng Li, Hao Song
    Quantitative Biology, 2023, 11(4): 405-420.

    Electroactive microorganisms (EAMs) could utilize extracellular electron transfer (EET) pathways to exchange electrons and energy with their external surroundings. Conductive cytochrome proteins and nanowires play crucial roles in controlling electron transfer rate from cytosol to extracellular electrode. Many previous studies elucidated how the c-type cytochrome proteins and conductive nanowires are synthesized, assembled, and engineered to manipulate the EET rate, and quantified the kinetic processes of electron generation and EET. Here, we firstly overview the electron transfer pathways of EAMs and quantify the kinetic parameters that dictating intracellular electron production and EET. Secondly, we systematically review the structure, conductivity mechanisms, and engineering strategies to manipulate conductive cytochromes and nanowire in EAMs. Lastly, we outlook potential directions for future research in cytochromes and conductive nanowires for enhanced electron transfer. This article reviews the quantitative kinetics of intracellular electron production and EET, and the contribution of engineered c-type cytochromes and conductive nanowire in enhancing the EET rate, which lay the foundation for enhancing electron transfer capacity of EAMs.

    Michael Q. Zhang
    Quantitative Biology, 2023, 11(4): 359-362.
    Jianfeng Feng
    Quantitative Biology, 2023, 11(4): 471-473.
    Feiran Li, Yu Chen, Johan Gustafsson, Hao Wang, Yi Wang, Chong Zhang, Xinhui Xing
    Quantitative Biology, 2023, 11(4): 363-375.

    Over the last 15 years, genome-scale metabolic models (GEMs) have been reconstructed for human and model animals, such as mouse and rat, to systematically understand metabolism, simulate multicellular or multi-tissue interplay, understand human diseases, and guide cell factory design for biopharmaceutical protein production. Here, we describe how metabolic networks can be represented using stoichiometric matrices and well-defined constraints for flux simulation. Then, we review the history of GEM development for quantitative understanding of Homo sapiens and other relevant animals, together with their applications. We describe how model develops from H. sapiens to other animals and from generic purpose to precise context-specific simulation. The progress of GEMs for animals greatly expand our systematic understanding of metabolism in human and related animals. We discuss the difficulties and present perspectives on the GEM development and the quest to integrate more biological processes and omics data for future research and translation. We truly hope that this review can inspire new models developed for other mammalian organisms and generate new algorithms for integrating big data to conduct more in-depth analysis to further make progress on human health and biopharmaceutical engineering.

    Dali Wang, Jiaxuan Li, Lei Wang, Yipeng Cao, Bo Kang, Xiangfei Meng, Sai Li, Chen Song
    Quantitative Biology, 2023, 11(4): 421-433.

    The causative pathogen of coronavirus disease 2019 (COVID-19), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is an enveloped virus assembled by a lipid envelope and multiple structural proteins. In this study, by integrating experimental data, structural modeling, as well as coarse-grained and all-atom molecular dynamics simulations, we constructed multiscale models of SARS-CoV-2. Our 500-ns coarse-grained simulation of the intact virion allowed us to investigate the dynamic behavior of the membrane-embedded proteins and the surrounding lipid molecules in situ. Our results indicated that the membrane-embedded proteins are highly dynamic, and certain types of lipids exhibit various binding preferences to specific sites of the membrane-embedded proteins. The equilibrated virion model was transformed into atomic resolution, which provided a 3D structure for scientific demonstration and can serve as a framework for future exascale all-atom molecular dynamics (MD) simulations. A short all-atom molecular dynamics simulation of 255 ps was conducted as a preliminary test for large-scale simulations of this complex system.

    Siyu Li, Songming Tang, Yunchang Wang, Sijie Li, Yuhang Jia, Shengquan Chen
    Quantitative Biology, 2024, 12(1): 85-99.

    Recent advances in single-cell chromatin accessibility sequencing (scCAS) technologies have resulted in new insights into the characterization of epigenomic heterogeneity and have increased the need for automatic cell type annotation. However, existing automatic annotation methods for scCAS data fail to incorporate the reference data and neglect novel cell types, which only exist in a test set. Here, we propose RAINBOW, a reference-guided automatic annotation method based on the contrastive learning framework, which is capable of effectively identifying novel cell types in a test set. By utilizing contrastive learning and incorporating reference data, RAINBOW can effectively characterize the heterogeneity of cell types, thereby facilitating more accurate annotation. With extensive experiments on multiple scCAS datasets, we show the advantages of RAINBOW over state-of-the-art methods in known and novel cell type annotation. We also verify the effectiveness of incorporating reference data during the training process. In addition, we demonstrate the robustness of RAINBOW to data sparsity and number of cell types. Furthermore, RAINBOW provides superior performance in newly sequenced data and can reveal biological implication in downstream analyses. All the results demonstrate the superior performance of RAINBOW in cell type annotation for scCAS data. We anticipate that RAINBOW will offer essential guidance and great assistance in scCAS data analysis. The source codes are available at the GitHub website (BioX-NKU/RAINBOW).