Cover illustration
The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. Zhang et al. proposed the concept and framework of Digital Life Systems or dLife as a new paradigm to unleash this potential. The framework aims to build cyber twins of healthy or diseased human body through quantitative measurement, informatic representation, and mathematic[Detail] ...
Download coverThe rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose the concept and framework of Digital Life Systems or dLife as a new paradigm to unleash this potential. It includes the multi-scale and multi-granule measure and representation of life in the digital space, the mathematical and/or computational modeling of the biology behind physiological and pathological processes, and ultimately cyber twins of healthy or diseased human body in the virtual space that can be used to simulate complex biological processes and deduce effects of medical treatments. We advocate that dLife is the route toward future AI precision medicine and should be the new paradigm for future biological and medical research.
Backgrounds: As an increasing number of synthetic switches and circuits have been created for plant systems and of synthetic products produced in plant chassis, plant synthetic biology is taking a strong foothold in agriculture and medicine. The ever-exploding data has also promoted the expansion of toolkits in this field. Genetic parts libraries and quantitative characterization approaches have been developed. However, plant synthetic biology is still in its infancy. The considerations for selecting biological parts to design and construct genetic circuits with predictable functions remain desired.
Results: In this article, we review the current biotechnological progresses in field of plant synthetic biology. Assembly standardization and quantitative approaches of genetic parts and genetic circuits are discussed. We also highlight the main challenges in the iterative cycles of design-build-test-learn for introducing novel traits into plants.
Conclusion: Plant synthetic biology promises to provide important solutions to many issues in agricultural production, human health care, and environmental sustainability. However, tremendous challenges exist in this field. For example, the quantitative characterization of genetic parts is limited; the orthogonality and the transfer functions of circuits are unpredictable; and also, the mathematical modeling-assisted circuits design still needs to improve predictability and reliability. These challenges are expected to be resolved in the near future as interests in this field are intensifying.
Background: Spatial multi-omics are demonstrated to be a powerful method to assist researchers on genetic studies. In this review, bioimaging-based spatial multi-omics techniques such as seqFISH+, merFISH, integrated DNA seqFISH+, DNA merFISH, and MINA are introduced along with each technique’s probe design, development, and imaging processes.
Results: seqFISH employed 4–5 fluorophores to barcode and conducted multiple rounds of hybridization, in order that mRNA can be identified through color-coding. seqFISH+ added 60 pseudo-color and distributed them equally into three channels to enhance imaging power, in order that i.e., 24,000 genes can be imaged in total. merFISH utilized 4 out 16 Hamming distance to innovatively provide a robust error-detecting method. MINA, a methodology combining merFISH (multiplexed error-robust fluorescence in situ hybridization) and chromosomal tracing, enabled multiplexed genomic architecture imaged in mammalian single cells. Optical reconstruction of chromatin architecture (ORCA) a method that could conduct DNA path tracing in nanoscale manner with kilobase resolution, an FISH variation that improved genetic resolution, enable high-precision fiducial registration and sequential imaging, and utilized Oligopaint probe to hybridize the short genomic region ranging from 2 to 10 kilobase. ORCA then prescribes these short section primary probes with individual barcodes to attach fluorophore and to be imaged.
Conclusion: This review concentrated on providing a comprehensive overview for these spatial-multi-omics techniques with the intention on helping researchers on selecting appropriate technique for their research.
Background: Single-cell multi-omics technologies allow a profound system-level biology understanding of cells and tissues. However, an integrative and possibly systems-based analysis capturing the different modalities is challenging. In response, bioinformatics and machine learning methodologies are being developed for multi-omics single-cell analysis. It is unclear whether current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter fine-tuning.
Methods: We designed LIBRA, a neural network based framework, to learn translation between paired multi-omics profiles so that a shared latent space is constructed. Additionally, we implemented a variation, aLIBRA, that allows automatic fine-tuning by identifying parameter combinations that optimize both the integrative and predictive tasks. All model parameters and evaluation metrics are made available to users with minimal user iteration. Furthermore, aLIBRA allows experienced users to implement custom configurations. The LIBRA toolbox is freely available as R and Python libraries at GitHub (TranslationalBioinformaticsUnit/LIBRA).
Results: LIBRA was evaluated in eight multi-omic single-cell data-sets, including three combinations of omics. We observed that LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type (clustering) resolution in the integrated latent space. Furthermore, when assessing the predictive power across data modalities, such as predictive chromatin accessibility from gene expression, LIBRA outperforms existing tools. As expected, adaptive parameter optimization (aLIBRA) significantly boosted the performance of learning predictive models from paired data-sets.
Conclusion: LIBRA is a versatile tool that performs competitively in both “integration” and “prediction” tasks based on single-cell multi-omics data. LIBRA is a data-driven robust platform that includes an adaptive learning scheme.
Background: Computational approaches for accurate prediction of drug interactions, such as drug-drug interactions (DDIs) and drug-target interactions (DTIs), are highly demanded for biochemical researchers. Despite the fact that many methods have been proposed and developed to predict DDIs and DTIs respectively, their success is still limited due to a lack of systematic evaluation of the intrinsic properties embedded in the corresponding chemical structure.
Methods: In this paper, we develop DeepDrug, a deep learning framework for overcoming the above limitation by using residual graph convolutional networks (Res-GCNs) and convolutional networks (CNNs) to learn the comprehensive structure- and sequence-based representations of drugs and proteins.
Results: DeepDrug outperforms state-of-the-art methods in a series of systematic experiments, including binary-class DDIs, multi-class/multi-label DDIs, binary-class DTIs classification and DTIs regression tasks. Furthermore, we visualize the structural features learned by DeepDrug Res-GCN module, which displays compatible and accordant patterns in chemical properties and drug categories, providing additional evidence to support the strong predictive power of DeepDrug. Ultimately, we apply DeepDrug to perform drug repositioning on the whole DrugBank database to discover the potential drug candidates against SARS-CoV-2, where 7 out of 10 top-ranked drugs are reported to be repurposed to potentially treat coronavirus disease 2019 (COVID-19).
Conclusions: To sum up, we believe that DeepDrug is an efficient tool in accurate prediction of DDIs and DTIs and provides a promising insight in understanding the underlying mechanism of these biochemical relations.
Background: Chromatin-associated RNA (caRNA) acts as a ubiquitous epigenetic layer in eukaryotes, and has been reported to be essential in various biological processes, including gene transcription, chromatin remodeling and cellular differentiation. Recently, numerous experimental techniques have been developed to characterize genome-wide RNA-chromatin interactions to understand their underlying biological functions. However, these experimental methods are generally expensive, time-consuming, and limited in identifying all potential sites, while most of the existing computational methods are restricted to detecting only specific types of RNAs interacting with chromatin.
Methods: Here, we propose a highly interpretable computational framework, named DeepRCI, to identify the interactions between various types of RNAs and chromatin. In this framework, we introduce a novel deep learning component called variformer and integrate multi-omics data to capture intrinsic genomic features at both RNA and DNA levels.
Results: Extensive experiments demonstrate that DeepRCI can detect RNA-chromatin interactions more accurately when compared to the state-of-the-art baseline prediction methods. Furthermore, the sequence features extracted by DeepRCI can be well matched to known critical gene regulatory components, indicating that our model can provide useful biological insights into understanding the underlying mechanisms of RNA-chromatin interactions. In addition, based on the prediction results, we further delineate the relationships between RNA-chromatin interactions and cellular functions, including gene expression and the modulation of cell states.
Conclusions: In summary, DeepRCI can serve as a useful tool for characterizing RNA-chromatin interactions and studying the underlying gene regulatory code.
Background: Oxford Nanopore long-read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short-read bisulfite sequencing or methylation microarrays. A number of analytical tools, such as Nanopolish, Guppy/Tombo and DeepMod, have been developed to detect DNA methylation on Nanopore data. However, additional improvements can be made in computational efficiency, prediction accuracy, and contextual interpretation on complex genomics regions (such as repetitive regions, low GC density regions).
Method: In the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. Transformer is an algorithm that adopts self-attention architecture in the neural networks and has been widely used in natural language processing.
Results: Compared to traditional deep-learning method such as convolutional neural network (CNN) and recurrent neural network (RNN), Transformer may have specific advantages in DNA methylation detection, because the self-attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation-specific signals within a specific sequence context.
Conclusion: We demonstrated the ability of Transformers to detect methylation on ionic signal data.
Background: The existence of doublets in single-cell RNA sequencing (scRNA-seq) data poses a great challenge in downstream data analysis. Computational doublet-detection methods have been developed to remove doublets from scRNA-seq data. Yet, the default hyperparameter settings of those methods may not provide optimal performance.
Methods: We propose a strategy to tune hyperparameters for a cutting-edge doublet-detection method. We utilize a full factorial design to explore the relationship between hyperparameters and detection accuracy on 16 real scRNA-seq datasets. The optimal hyperparameters are obtained by a response surface model and convex optimization.
Results: We show that the optimal hyperparameters provide top performance across scRNA-seq datasets under various biological conditions. Our tuning strategy can be applied to other computational doublet-detection methods. It also offers insights into hyperparameter tuning for broader computational methods in scRNA-seq data analysis.
Conclusions: The hyperparameter configuration significantly impacts the performance of computational doublet-detection methods. Our study is the first attempt to systematically explore the optimal hyperparameters under various biological conditions and optimization objectives. Our study provides much-needed guidance for hyperparameter tuning in computational doublet-detection methods.
Background: Living cells need to undergo subtle shape adaptations in response to the topography of their substrates. These shape changes are mainly determined by reorganization of their internal cytoskeleton, with a major contribution from filamentous (F) actin. Bundles of F-actin play a major role in determining cell shape and their interaction with substrates, either as “stress fibers,” or as our newly discovered “Concave Actin Bundles” (CABs), which mainly occur while endothelial cells wrap micro-fibers in culture.
Methods: To better understand the morphology and functions of these CABs, it is necessary to recognize and analyze as many of them as possible in complex cellular ensembles, which is a demanding and time-consuming task. In this study, we present a novel algorithm to automatically recognize CABs without further human intervention. We developed and employed a multilayer perceptron artificial neural network (“the recognizer”), which was trained to identify CABs.
Results: The recognizer demonstrated high overall recognition rate and reliability in both randomized training, and in subsequent testing experiments.
Conclusion: It would be an effective replacement for validation by visual detection which is both tedious and inherently prone to errors.
Background: Molecular docking-based virtual screening (VS) aims to choose ligands with potential pharmacological activities from millions or even billions of molecules. This process could significantly cut down the number of compounds that need to be experimentally tested. However, during the docking calculation, many molecules have low affinity for a particular protein target, which waste a lot of computational resources.
Methods: We implemented a fast and practical molecular screening approach called DL-DockVS (deep learning dock virtual screening) by using deep learning models (regression and classification models) to learn the outcomes of pipelined docking programs step-by-step.
Results: In this study, we showed that this approach could successfully weed out compounds with poor docking scores while keeping compounds with potentially high docking scores against 10 DUD-E protein targets. A self-built dataset of about 1.9 million molecules was used to further verify DL-DockVS, yielding good results in terms of recall rate, active compounds enrichment factor and runtime speed.
Conclusions: We comprehensively evaluate the practicality and effectiveness of DL-DockVS against 10 protein targets. Due to the improvements of runtime and maintained success rate, it would be a useful and promising approach to screen ultra-large compound libraries in the age of big data. It is also very convenient for researchers to make a well-trained model of one specific target for predicting other chemical libraries and high docking-score molecules without docking computation again.
Background: Massively parallel sequencing of environmental DNA allows microbiological studies to be performed in greater detail than was possible with first-generation sequencing. For example, it facilitates the use of approaches hitherto largely applied to flora and fauna, such as rank abundance distribution (RAD) analyses.
Methods: Here, we set out to advance the knowledge on Ca. Pelagibacterales (SAR11) communities from southern South America using environmental sequences from the open ocean in the Argentine sea, the uncharted Engaño Bay, as well as a river and an oligohaline shallow lake from the Patagonian Steppe ecoregion. The structures of the SAR11 assemblages present in these ecosystems were dissected by direct and rarefaction-based estimates of species richness, and evaluations of the corresponding abundance distributions (ADs), which was addressed by RAD analyses.
Results: Microbial community composition analyses revealed that the studied SAR11 assemblages coexist with 27 bacterial phyla. SAR11 richness was in general very high, but ADs turned out to be highly uneven. The results were compatible with prior knowledge, and similar to that derived from point estimates of diversity. However, our comprehensive dissection allowed for more detailed quantitative comparisons to be made between the environments surveyed, and revealed differences regarding both richness and the underlying ADs.
Conclusions: Despite SAR11 assemblages being extremely rich, their ADs are very uneven. Richness and ADs can vary, not only between fresh and salt water, but also between oceanic and coastal marine environments. The obtained results provide insights on general topics such as adaptation and the contrast between marine and freshwater radiations.
Background: Accumulating evidence shows that long non-coding RNAs (lncRNAs) play critical roles in cancer progression. The possible association between lncRNAs and herbal medicine is yet to be known. This study aims to identify medicinal herbs associated with lncRNAs by RNA-seq data for breast and prostate cancer.
Methods: To develop the optimal approach for identifying cancer-related lncRNAs, we implemented two steps: (1) applying protein–protein interaction (PPI), Gene Ontology (GO), and pathway analyses, and (2) applying attribute weighting and finding the efficient classification model of the machine learning approach.
Results: In the first step, GO terms and pathway analyses on differential co-expressed mRNAs revealed that lncRNAs were widely co-expressed with metabolic process genes. We identified two hub lncRNA-mRNA networks that implicate lncRNAs associated with breast and prostate cancer. In the second step, we implemented various machine learning-based prediction systems (Decision Tree, Random Forest, Deep Learning, and Gradient-Boosted Tree) on the non-transformed and Z-standardized differential co-expressed lncRNAs. Based on five-fold cross-validation, we obtained high accuracy (91.11%), high sensitivity (88.33%), and high specificity (93.33%) in Deep Learning which reinforces the biomarker power of identified lncRNAs in this study. As data originally came from different cell lines at different durations of herbal treatment intervention, we applied seven attribute weighting algorithms to check the effects of variables on identifying lncRNAs. Attribute weighting results showed that the cell line and time had little or no effect on the selected lncRNAs list. Besides, we identified one known lncRNAs, downregulated RNA in cancer (DRAIC), as an essential feature.
Conclusions: This study will provide further insights to investigate the potential therapeutic and prognostic targets for prostate cancer (PC) and breast cancer (BC) in common.