Quant. Biol.

Journal home Browse Most cited

Most cited

Select all

REVIEW

Progress in molecular docking

Jiyu Fan, Ailing Fu, Le Zhang

Quantitative Biology, 2019, 7(2): 83-89. https://doi.org/10.1007/s40484-019-0172-y

Download PDF

Background: In recent years, since the molecular docking technique can greatly improve the efficiency and reduce the research cost, it has become a key tool in computer-assisted drug design to predict the binding affinity and analyze the interactive mode.
Results: This study introduces the key principles, procedures and the widely-used applications for molecular docking. Also, it compares the commonly used docking applications and recommends which research areas are suitable for them. Lastly, it briefly reviews the latest progress in molecular docking such as the integrated method and deep learning.
Conclusion: Limited to the incomplete molecular structure and the shortcomings of the scoring function, current docking applications are not accurate enough to predict the binding affinity. However, we could improve the current molecular docking technique by integrating the big biological data into scoring function.
RESEARCH ARTICLE

Identifying viruses from metagenomic data using deep learning

Jie Ren, Kai Song, Chao Deng, Nathan A. Ahlgren, Jed A. Fuhrman, Yi Li, Xiaohui Xie, Ryan Poplin, Fengzhu Sun

Quantitative Biology, 2020, 8(1): 64-77. https://doi.org/10.1007/s40484-019-0187-4

Download PDF

Background: The recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture. Existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences from metagenomic data.

Methods: Here we developed a reference-free and alignment-free machine learning method, DeepVirFinder, for identifying viral sequences in metagenomic data using deep learning.

Results: Trained based on sequences from viral RefSeq discovered before May 2015, and evaluated on those discovered after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths, achieving AUROC 0.93, 0.95, 0.97, and 0.98 for 300, 500, 1000, and 3000 bp sequences respectively. Enlarging the training data with additional millions of purified viral sequences from metavirome samples further improved the accuracy for identifying virus groups that are under-represented. Applying DeepVirFinder to real human gut metagenomic samples, we identified 51,138 viral sequences belonging to 175 bins in patients with colorectal carcinoma (CRC). Ten bins were found associated with the cancer status, suggesting viruses may play important roles in CRC.

Conclusions: Powered by deep learning and high throughput sequencing metagenomic data, DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics.
RESEARCH ARTICLE

Modeling the epidemic dynamics and control of COVID-19 outbreak in China

Shilei Zhao, Hua Chen

Quantitative Biology, 2020, 8(1): 11-19. https://doi.org/10.1007/s40484-020-0199-0

Download PDF

Background: The coronavirus disease 2019 (COVID-19) is rapidly spreading in China and more than 30 countries over last two months. COVID-19 has multiple characteristics distinct from other infectious diseases, including high infectivity during incubation, time delay between real dynamics and daily observed number of confirmed cases, and the intervention effects of implemented quarantine and control measures.

Methods: We develop a Susceptible, Un-quanrantined infected, Quarantined infected, Confirmed infected (SUQC) model to characterize the dynamics of COVID-19 and explicitly parameterize the intervention effects of control measures, which is more suitable for analysis than other existing epidemic models.

Results: The SUQC model is applied to the daily released data of the confirmed infections to analyze the outbreak of COVID-19 in Wuhan, Hubei (excluding Wuhan), China (excluding Hubei) and four first-tier cities of China. We found that, before January 30, 2020, all these regions except Beijing had a reproductive number , and after January 30, all regions had a reproductive number , indicating that the quarantine and control measures are effective in preventing the spread of COVID-19. The confirmation rate of Wuhan estimated by our model is 0.0643, substantially lower than that of Hubei excluding Wuhan (0.1914), and that of China excluding Hubei (0.2189), but it jumps to 0.3229 after February 12 when clinical evidence was adopted in new diagnosis guidelines. The number of un-quarantined infected cases in Wuhan on February 12, 2020 is estimated to be 3,509 and declines to 334 on February 21, 2020. After fitting the model with data as of February 21, 2020, we predict that the end time of COVID-19 in Wuhan and Hubei is around late March, around mid March for China excluding Hubei, and before early March 2020 for the four tier-one cities. A total of 80,511 individuals are estimated to be infected in China, among which 49,510 are from Wuhan, 17,679 from Hubei (excluding Wuhan), and the rest 13,322 from other regions of China (excluding Hubei). Note that the estimates are from a deterministic ODE model and should be interpreted with some uncertainty.

Conclusions: We suggest that rigorous quarantine and control measures should be kept before early March in Beijing, Shanghai, Guangzhou and Shenzhen, and before late March in Hubei. The model can also be useful to predict the trend of epidemic and provide quantitative guide for other countries at high risk of outbreak, such as South Korea, Japan, Italy and Iran.
REVIEW

Target specificity of the CRISPR-Cas9 system

Xuebing Wu, Andrea J. Kriz, Phillip A. Sharp

Quantitative Biology, 2014, 2(2): 59-70. https://doi.org/10.1007/s40484-014-0030-x

Download PDF

The CRISPR-Cas9 system, naturally a defense mechanism in prokaryotes, has been repurposed as an RNA-guided DNA targeting platform. It has been widely used for genome editing and transcriptome modulation, and has shown great promise in correcting mutations in human genetic diseases. Off-target effects are a critical issue for all of these applications. Here we review the current status on the target specificity of the CRISPR-Cas9 system.
REVIEW

Performance measures in evaluating machine learning based bioinformatics predictors for classifications

Yasen Jiao, Pufeng Du

Quantitative Biology, 2016, 4(4): 320-330. https://doi.org/10.1007/s40484-016-0081-2

Download PDF

Background: Many existing bioinformatics predictors are based on machine learning technology. When applying these predictors in practical studies, their predictive performances should be well understood. Different performance measures are applied in various studies as well as different evaluation methods. Even for the same performance measure, different terms, nomenclatures or notations may appear in different context.

Results: We carried out a review on the most commonly used performance measures and the evaluation methods for bioinformatics predictors.

Conclusions: It is important in bioinformatics to correctly understand and interpret the performance, as it is the key to rigorously compare performances of different predictors and to choose the right predictor.
REVIEW

Modeling the specificity of protein-DNA interactions

Gary D. Stormo

Quantitative Biology, 2013, 1(2): 115-130. https://doi.org/10.1007/s40484-013-0012-4

Download PDF

The specificity of protein-DNA interactions is most commonly modeled using position weight matrices (PWMs). First introduced in 1982, they have been adapted to many new types of data and many different approaches have been developed to determine the parameters of the PWM. New high-throughput technologies provide a large amount of data rapidly and offer an unprecedented opportunity to determine accurately the specificities of many transcription factors (TFs). But taking full advantage of the new data requires advanced algorithms that take into account the biophysical processes involved in generating the data. The new large datasets can also aid in determining when the PWM model is inadequate and must be extended to provide accurate predictions of binding sites. This article provides a general mathematical description of a PWM and how it is used to score potential binding sites, a brief history of the approaches that have been developed and the types of data that are used with an emphasis on algorithms that we have developed for analyzing high-throughput datasets from several new technologies. It also describes extensions that can be added when the simple PWM model is inadequate and further enhancements that may be necessary. It briefly describes some applications of PWMs in the discovery and modeling of in vivo regulatory networks.
RESEARCH ARTICLE

Predicting enhancer-promoter interaction from genomic sequence with deep neural networks

Shashank Singh, Yang Yang, Barnabás Póczos, Jian Ma

Quantitative Biology, 2019, 7(2): 122-137. https://doi.org/10.1007/s40484-019-0154-0

Download PDF

Background: In the human genome, distal enhancers are involved in regulating target genes through proximal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions.
Methods: Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given.
Results: Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes.
Conclusions: This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.
REVIEW

Current challenges and solutions of de novo assembly

Xingyu Liao, Min Li, You Zou, Fang-Xiang Wu, Yi-Pan, Jianxin Wang

Quantitative Biology, 2019, 7(2): 90-109. https://doi.org/10.1007/s40484-019-0166-9

Download PDF

Background: Next-generation sequencing (NGS) technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. However, numerous technical or computational challenges in de novo assembly still remain, although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings.
Results: In this review, we first briefly introduce some of the major challenges faced by NGS sequence assembly. Then, we analyze the characteristics of various sequencing platforms and their impact on assembly results. After that, we classify de novo assemblers according to their frameworks (overlap graph-based, de Bruijn graph-based and string graph-based), and introduce the characteristics of each assembly tool and their adaptation scene. Next, we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data, single-cell sequencing data and single molecule sequencing data. At last, we discuss the application of SMS long reads in solving problems encountered in NGS assembly.
Conclusions: This review not only gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.
RESEARCH ARTICLE

Variable importance-weighted Random Forests

Yiyi Liu, Hongyu Zhao

Quantitative Biology, 2017, 5(4): 338-351. https://doi.org/10.1007/s40484-017-0121-6

Download PDF

Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest.

Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features.

Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases.

Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package “viRandomForests” based on the original R package “randomForest” and it can be freely downloaded from http://zhaocenter.org/software.
REVIEW

Modeling and analysis of RNA-seq data: a review from a statistical perspective

Wei Vivian Li, Jingyi Jessica Li

Quantitative Biology, 2018, 6(3): 195-209. https://doi.org/10.1007/s40484-018-0144-7

Download PDF

Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date.
Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations.
Conclusions: The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.
REVIEW

Dynamical network biomarkers for identifying critical transitions and their driving networks of biologic processes

Rui Liu, Kazuyuki Aihara, Luonan Chen

Quantitative Biology, 2013, 1(2): 105-114. https://doi.org/10.1007/s40484-013-0008-0

Download PDF

Non-smooth or even abrupt state changes exist during many biological processes, e.g., cell differentiation processes, proliferation processes, or even disease deterioration processes. Such dynamics generally signals the emergence of critical transition phenomena, which result in drastic changes of system states or eventually qualitative changes of phenotypes. Hence, it is of great importance to detect such transitions and further reveal their molecular mechanisms at network level. Here, we review the recent advances on dynamical network biomarkers (DNBs) as well as the related theoretical foundation, which can identify not only early signals of the critical transitions but also their leading networks, which drive the whole system to initiate such transitions. In order to demonstrate the effectiveness of this novel approach, examples of complex diseases are also provided to detect pre-disease stage, for which traditional methods or biomarkers failed.
REVIEW

Mendelian randomization and pleiotropy analysis

Xiaofeng Zhu

Quantitative Biology, 2021, 9(2): 122-132. https://doi.org/10.1007/s40484-020-0216-3

Download PDF

Background: Mendelian randomization (MR) analysis has become popular in inferring and estimating the causality of an exposure on an outcome due to the success of genome wide association studies. Many statistical approaches have been developed and each of these methods require specific assumptions.

Results: In this article, we review the pros and cons of these methods. We use an example of high-density lipoprotein cholesterol on coronary artery disease to illuminate the challenges in Mendelian randomization investigation.

Conclusion: The current available MR approaches allow us to study causality among risk factors and outcomes. However, novel approaches are desirable for overcoming multiple source confounding of risk factors and an outcome in MR analysis.
RESEARCH ARTICLE

Imputation of single-cell gene expression with an autoencoder neural network

Md. Bahadur Badsha, Rui Li, Boxiang Liu, Yang I. Li, Min Xian, Nicholas E. Banovich, Audrey Qiuyan Fu

Quantitative Biology, 2020, 8(1): 78-94. https://doi.org/10.1007/s40484-019-0192-7

Download PDF

Background: Single-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses.

Methods: To overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates.

Results: On both simulated and real data, LATE and TRANSLATE outperform existing scRNA-seq imputation methods, achieving lower mean squared error in most cases, recovering nonlinear gene-gene relationships, and better separating cell types. They are also highly scalable and can efficiently process over 1 million cells in just a few hours on a GPU.

Conclusions: We demonstrate that our nonparametric approach to imputation based on autoencoders is powerful and highly efficient.
RESEARCH ARTICLE

Bistability and oscillations in co-repressive synthetic microbial consortia

Mehdi Sadeghpour, Alan Veliz-Cuba, Gábor Orosz, Krešimir Josić, Matthew R. Bennett

Quantitative Biology, 2017, 5(1): 55-66. https://doi.org/10.1007/s40484-017-0100-y

Download PDF

Background: Synthetic microbial consortia are conglomerations of genetically engineered microbes programmed to cooperatively bring about population-level phenotypes. By coordinating their activity, the constituent strains can display emergent behaviors that are difficult to engineer into isogenic populations. To do so, strains are engineered to communicate with one another through intercellular signaling pathways that depend on cell density.

Methods: Here, we used computational modeling to examine how the behavior of synthetic microbial consortia results from the interplay between population dynamics governed by cell growth and internal transcriptional dynamics governed by cell-cell signaling. Specifically, we examined a synthetic microbial consortium in which two strains each produce signals that down-regulate transcription in the other. Within a single strain this regulatory topology is called a “co-repressive toggle switch” and can lead to bistability.

Results: We found that in co-repressive synthetic microbial consortia the existence and stability of different states depend on population-level dynamics. As the two strains passively compete for space within the colony, their relative fractions fluctuate and thus alter the strengths of intercellular signals. These fluctuations drive the consortium to alternative equilibria. Additionally, if the growth rates of the strains depend on their transcriptional states, an additional feedback loop is created that can generate oscillations.

Conclusions: Our findings demonstrate that the dynamics of microbial consortia cannot be predicted from their regulatory topologies alone, but are also determined by interactions between the strains. Therefore, when designing synthetic microbial consortia that use intercellular signaling, one must account for growth variations caused by the production of protein.
REVIEW

Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era

Raffaella Rizzi, Stefano Beretta, Murray Patterson, Yuri Pirola, Marco Previtali, Gianluca Della Vedova, Paola Bonizzoni

Quantitative Biology, 2019, 7(4): 278-292. https://doi.org/10.1007/s40484-019-0181-x

Download PDF

Background: De novo genome assembly relies on two kinds of graphs: de Bruijn graphs and overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn graphs have become the dominant technical device in the last decade. Those two kinds of graphs are collectively called assembly graphs.
Results: In this review, we discuss the most recent advances in the problem of constructing, representing and navigating assembly graphs, focusing on very large datasets. We will also explore some computational techniques, such as the Bloom filter, to compactly store graphs while keeping all functionalities intact.
Conclusions: We complete our analysis with a discussion on the algorithmic issues of assembling from long reads (e.g., PacBio and Oxford Nanopore). Finally, we present some of the most relevant open problems in this field.
PERSPECTIVE

Empowering beginners in bioinformatics with ChatGPT

Evelyn Shue, Li Liu, Bingxin Li, Zifeng Feng, Xin Li, Gangqing Hu

Quantitative Biology, 2023, 11(2): 105-108. https://doi.org/10.15302/J-QB-023-0327

Download PDF

The impressive conversational and programming abilities of ChatGPT make it an attractive tool for facilitating the education of bioinformatics data analysis for beginners. In this study, we proposed an iterative model to fine-tune instructions for guiding a chatbot in generating code for bioinformatics data analysis tasks. We demonstrated the feasibility of the model by applying it to various bioinformatics topics. Additionally, we discussed practical considerations and limitations regarding the use of the model in chatbot-aided bioinformatics education.
REVIEW

Transcriptome-wide association studies: a view from Mendelian randomization

Huanhuan Zhu, Xiang Zhou

Quantitative Biology, 2021, 9(2): 107-121. https://doi.org/10.1007/s40484-020-0207-4

Download PDF

Background: Genome-wide association studies (GWASs) have identified thousands of genetic variants that are associated with many complex traits. However, their biological mechanisms remain largely unknown. Transcriptome-wide association studies (TWAS) have been recently proposed as an invaluable tool for investigating the potential gene regulatory mechanisms underlying variant-trait associations. Specifically, TWAS integrate GWAS with expression mapping studies based on a common set of variants and aim to identify genes whose GReX is associated with the phenotype. Various methods have been developed for performing TWAS and/or similar integrative analysis. Each such method has a different modeling assumption and many were initially developed to answer different biological questions. Consequently, it is not straightforward to understand their modeling property from a theoretical perspective.

Results: We present a technical review on thirteen TWAS methods. Importantly, we show that these methods can all be viewed as two-sample Mendelian randomization (MR) analysis, which has been widely applied in GWASs for examining the causal effects of exposure on outcome. Viewing different TWAS methods from an MR perspective provides us a unique angle for understanding their benefits and pitfalls. We systematically introduce the MR analysis framework, explain how features of the GWAS and expression data influence the adaptation of MR for TWAS, and re-interpret the modeling assumptions made in different TWAS methods from an MR angle. We finally describe future directions for TWAS methodology development.

Conclusions: We hope that this review would serve as a useful reference for both methodologists who develop TWAS methods and practitioners who perform TWAS analysis.
REVIEW

Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions

Krishna Choudhary, Fei Deng, Sharon Aviran

Quantitative Biology, 2017, 5(1): 3-24. https://doi.org/10.1007/s40484-017-0093-6

Download PDF

Background: Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transcriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data.

Results: We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy.

Conclusions: To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.
REVIEW

Phage engineering: how advances in molecular biology and synthetic biology are being utilized to enhance the therapeutic potential of bacteriophages

Russell Brown, Andreas Lengeling, Baojun Wang

Quantitative Biology, 2017, 5(1): 42-54. https://doi.org/10.1007/s40484-017-0094-5

Download PDF

Background: The therapeutic potential of bacteriophages has been debated since their first isolation and characterisation in the early 20^th century. However, a lack of consistency in application and observed efficacy during their early use meant that upon the discovery of antibiotic compounds research in the field of phage therapy quickly slowed. The rise of antibiotic resistance in bacteria and improvements in our abilities to modify and manipulate DNA, especially in the context of small viral genomes, has led to a recent resurgence of interest in utilising phage as antimicrobial therapeutics.

Results: In this article a number of results from the literature that have aimed to address key issues regarding the utility and efficacy of phage as antimicrobial therapeutics utilising molecular biology and synthetic biology approaches will be introduced and discussed, giving a general view of the recent progress in the field.

Conclusions: Advances in molecular biology and synthetic biology have enabled rapid progress in the field of phage engineering, with this article highlighting a number of promising strategies developed to optimise phages for the treatment of bacterial disease. Whilst many of the same issues that have historically limited the use of phages as therapeutics still exist, these modifications, or combinations thereof, may form a basis upon which future advances can be built. A focus on rigorous in vivo testing and investment in clinical trials for promising candidate phages may be required for the field to truly mature, but there is renewed hope that the potential benefits of phage therapy may finally be realised.
REVIEW

Recent advances in molecular machines based on toehold-mediated strand displacement reaction

Yijun Guo, Bing Wei, Shiyan Xiao, Dongbao Yao, Hui Li, Huaguo Xu, Tingjie Song, Xiang Li, Haojun Liang

Quantitative Biology, 2017, 5(1): 25-41. https://doi.org/10.1007/s40484-017-0097-2

Download PDF

Background: The DNA strand displacement reaction, which uses flexible and programmable DNA molecules as reaction components, is the basis of dynamic DNA nanotechnology, and has been widely used in the design of complex autonomous behaviors.

Results: In this review, we first briefly introduce the concept of toehold-mediated strand displacement reaction and its kinetics regulation in pure solution. Thereafter, we review the recent progresses in DNA complex circuit, the assembly of AuNPs driven by DNA molecular machines, and the detection of single nucleotide polymorphism (SNP) using DNA toehold exchange probes in pure solution and in interface state. Lastly, the applications of toehold-mediated strand displacement in the genetic regulation and silencing through combining gene circuit with RNA interference systems are reviewed.

Conclusions: The toehold-mediated strand displacement reaction makes DNA an excellent material for the fabrication of molecular machines and complex circuit, and may potentially be used in the disease diagnosis and the regulation of gene silencing in the near future.
RESEARCH ARTICLE

Analysis of alternative cleavage and polyadenylation in mature and differentiating neurons using RNA-seq data

Aysegul Guvenek, Bin Tian

Quantitative Biology, 2018, 6(3): 253-266. https://doi.org/10.1007/s40484-018-0148-3

Download PDF

Background: Most eukaryotic protein-coding genes exhibit alternative cleavage and polyadenylation (APA), resulting in mRNA isoforms with different 3′ untranslated regions (3′ UTRs). Studies have shown that brain cells tend to express long 3′ UTR isoforms using distal cleavage and polyadenylation sites (PASs).

Methods: Using our recently developed, comprehensive PAS database PolyA_DB, we developed an efficient method to examine APA, named Significance Analysis of Alternative Polyadenylation using RNA-seq (SAAP-RS). We applied this method to study APA in brain cells and neurogenesis.

Results: We found that neurons globally express longer 3′ UTRs than other cell types in brain, and microglia and endothelial cells express substantially shorter 3′ UTRs. We show that the 3′ UTR diversity across brain cells can be corroborated with single cell sequencing data. Further analysis of APA regulation of 3′ UTRs during differentiation of embryonic stem cells into neurons indicates that a large fraction of the APA events regulated in neurogenesis are similarly modulated in myogenesis, but to a much greater extent.

Conclusion: Together, our data delineate APA profiles in different brain cells and indicate that APA regulation in neurogenesis is largely an augmented process taking place in other types of cell differentiation.
REVIEW

Systems and synthetic biology approaches in understanding biological oscillators

Zhengda Li, Qiong Yang

Quantitative Biology, 2018, 6(1): 1-14. https://doi.org/10.1007/s40484-017-0120-7

Download PDF

Background: Self-sustained oscillations are a ubiquitous and vital phenomenon in living systems. From primitive single-cellular bacteria to the most sophisticated organisms, periodicities have been observed in a broad spectrum of biological processes such as neuron firing, heart beats, cell cycles, circadian rhythms, etc. Defects in these oscillators can cause diseases from insomnia to cancer. Elucidating their fundamental mechanisms is of great significance to diseases, and yet challenging, due to the complexity and diversity of these oscillators.
Results: Approaches in quantitative systems biology and synthetic biology have been most effective by simplifying the systems to contain only the most essential regulators. Here, we will review major progress that has been made in understanding biological oscillators using these approaches. The quantitative systems biology approach allows for identification of the essential components of an oscillator in an endogenous system. The synthetic biology approach makes use of the knowledge to design the simplest, de novo oscillators in both live cells and cell-free systems. These synthetic oscillators are tractable to further detailed analysis and manipulations.
Conclusion: With the recent development of biological and computational tools, both approaches have made significant achievements.
REVIEW

A survey of web resources and tools for the study of TCM network pharmacology

Jing Zhao, Jian Yang, Saisai Tian, Weidong Zhang

Quantitative Biology, 2019, 7(1): 17-29. https://doi.org/10.1007/s40484-019-0167-8

Download PDF

Background: Traditional Chinese medicine (TCM) treats diseases in a holistic manner, while TCM formulae are multi-component, multi-target agents at the molecular level. Thus there are many parallels between the key ideas of TCM pharmacology and network pharmacology. These years, TCM network pharmacology has developed as an interdisciplinary of TCM science and network pharmacology, which studies the mechanism of TCM at the molecular level and in the context of biological networks. It provides a new research paradigm that can use modern biomedical science to interpret the mechanism of TCM, which is promising to accelerate the modernization and internationalization of TCM.

Results: In this paper we introduce state-of-the-art free data sources, web servers and softwares that can be used in the TCM network pharmacology, including databases of TCM, drug targets and diseases, web servers for the prediction of drug targets, and tools for network and functional analysis.

Conclusions: This review could help experimental pharmacologists make better use of the existing data and methods in their study of TCM.
PROTOCOL AND TUTORIAL

Differential methylation analysis for bisulfite sequencing using DSS

Hao Feng, Hao Wu

Quantitative Biology, 2019, 7(4): 327-334. https://doi.org/10.1007/s40484-019-0183-8

Download PDF

Bisulfite sequencing (BS-seq) technology measures DNA methylation at single nucleotide resolution. A key task in BS-seq data analysis is to identify differentially methylation (DM) under different conditions. Here we provide a tutorial for BS-seq DM analysis using Bioconductor package DSS. DSS uses a beta-binomial model to characterize the sequence counts from BS-seq, and implements rigorous statistical method for hypothesis testing. It provides flexible functionalities for a variety of DM analyses.
RESEARCH ARTICLE

Applications of species accumulation curves in large-scale biological data analysis

Chao Deng, Timothy Daley, Andrew Smith

Quantitative Biology, 2015, 3(3): 135-144. https://doi.org/10.1007/s40484-015-0049-7

Download PDF

The species accumulation curve, or collector’s curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45–63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k-mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible.
MINI REVIEW

Emerging deep learning methods for single-cell RNA-seq data analysis

Jie Zheng, Ke Wang

Quantitative Biology, 2019, 7(4): 247-254. https://doi.org/10.1007/s40484-019-0189-2

Download PDF

Deep learning is making major breakthrough in several areas of bioinformatics. Anticipating that this will occur soon for the single-cell RNA-seq data analysis, we review newly published deep learning methods that help tackle computational challenges. Autoencoders are found to be the dominant approach. However, methods based on deep generative models such as generative adversarial networks (GANs) are also emerging in this area.
REVIEW

Pre-mRNA modifications and their role in nuclear processing

Nicole M. Martinez, Wendy V. Gilbert

Quantitative Biology, 2018, 6(3): 210-227. https://doi.org/10.1007/s40484-018-0147-4

Download PDF

Background: Cellular non-coding RNAs are extensively modified post-transcriptionally, with more than 100 chemically distinct nucleotides identified to date. In the past five years, new sequencing based methods have revealed widespread decoration of eukaryotic messenger RNA with diverse RNA modifications whose functions in mRNA metabolism are only beginning to be known.
Results: Since most of the identified mRNA modifying enzymes are present in the nucleus, these modifications have the potential to function in nuclear pre-mRNA processing including alternative splicing. Here we review recent progress towards illuminating the role of pre-mRNA modifications in splicing and highlight key areas for future investigation in this rapidly growing field.
Conclusions: Future studies to identify which modifications are added to nascent pre-mRNA and to interrogate the direct effects of individual modifications are likely to reveal new mechanisms by which nuclear pre-mRNA processing is regulated.
REVIEW

From Phage lambda to human cancer: endogenous molecular-cellular network hypothesis

Gaowei Wang, Xiaomei Zhu, Leroy Hood, Ping Ao

Quantitative Biology, 2013, 1(1): 32-49. https://doi.org/10.1007/s40484-013-0007-1

Download PDF

Experimental evidences and theoretical analyses have amply suggested that in cancer genesis and progression genetic information is very important but not the whole. Nevertheless, “cancer as a disease of the genome” is still currently the dominant doctrine. With such a background and based on the fundamental properties of biological systems, a new endogenous molecular-cellular network theory for cancer was recently proposed by us. Similar proposals were also made by others. The new theory attempts to incorporate both genetic and environmental effects into one single framework, with the possibility to give a quantitative and dynamical description. It is asserted that the complex regulatory machinery behind biological processes may be modeled by a nonlinear stochastic dynamical system similar to a noise perturbed Morse-Smale system. Both qualitative and quantitative descriptions may be obtained. The dynamical variables are specified by a set of endogenous molecular-cellular agents and the structure of the dynamical system by the interactions among those biological agents. Here we review this theory from a pedagogical angle which emphasizes the role of modularization, hierarchy and autonomous regulation. We discuss how the core set of assumptions is exemplified in detail in one of the simple, important and well studied model organisms, Phage lambda. With this concrete and quantitative example in hand, we show that the application of the hypothesized theory in human cancer, such as hepatocellular carcinoma (HCC), is plausible, and that it may provide a set of new insights on understanding cancer genesis and progression, and on strategies for cancer prevention, cure, and care.
RESEARCH ARTICLE

Construction of precise support vector machine based models for predicting promoter strength

Hailin Meng, Yingfei Ma, Guoqin Mai, Yong Wang, Chenli Liu

Quantitative Biology, 2017, 5(1): 90-98. https://doi.org/10.1007/s40484-017-0096-3

Download PDF

Background: The prediction of the prokaryotic promoter strength based on its sequence is of great importance not only in the fundamental research of life sciences but also in the applied aspect of synthetic biology. Much advance has been made to build quantitative models for strength prediction, especially the introduction of machine learning methods such as artificial neural network (ANN) has significantly improve the prediction accuracy. As one of the most important machine learning methods, support vector machine (SVM) is more powerful to learn knowledge from small sample dataset and thus supposed to work in this problem.

Methods: To confirm this, we constructed SVM based models to quantitatively predict the promoter strength. A library of 100 promoter sequences and strength values was randomly divided into two datasets, including a training set (≥10 sequences) for model training and a test set (≥10 sequences) for model test.

Results: The results indicate that the prediction performance increases with an increase of the size of training set, and the best performance was achieved at the size of 90 sequences. After optimization of the model parameters, a high-performance model was finally trained, with a high squared correlation coefficient for fitting the training set (R²>0.99) and the test set (R²>0.98), both of which are better than that of ANN obtained by our previous work.

Conclusions: Our results demonstrate the SVM-based models can be employed for the quantitative prediction of promoter strength.
RESEARCH ARTICLE

Regulation by competition: a hidden layer of gene regulatory network

Lei Wei, Ye Yuan, Tao Hu, Shuailin Li, Tianrun Cheng, Jinzhi Lei, Zhen Xie, Michael Q. Zhang, Xiaowo Wang

Quantitative Biology, 2019, 7(2): 110-121. https://doi.org/10.1007/s40484-018-0162-5

Download PDF

Background: Molecular competition brings about trade-offs of shared limited resources among the cellular components, and thus introduces a hidden layer of regulatory mechanism by connecting components even without direct physical interactions. Several molecular competition scenarios have been observed recently, but there is still a lack of systematic quantitative understanding to reveal the essence of molecular competition.
Methods: Here, by abstracting the analogous competition mechanism behind diverse molecular systems, we built a unified coarse-grained competition motif model to systematically integrate experimental evidences in these processes and analyzed general properties shared behind them from steady-state behavior to dynamic responses.
Results: We could predict in what molecular environments competition would reveal threshold behavior or display a negative linear dependence. We quantified how competition can shape regulator-target dose-response curve, modulate dynamic response speed, control target expression noise, and introduce correlated fluctuations between targets.
Conclusions: This work uncovered the complexity and generality of molecular competition effect as a hidden layer of gene regulatory network, and therefore provided a unified insight and a theoretical framework to understand and employ competition in both natural and synthetic systems.

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs

Most cited

Please choose a citation manager