Quantitative Biology

2020-06-15 2020, Volume 8 Issue 2

Previous Next

Cover illustration

RNA binding proteins (RBPs) are known as key post-transcriptional regulators. The recent technology, cross-linking and immunoprecipitation followed by sequencing (CLIP-seq), has made it possible to investigate the interaction between RBPs and RNAs. However, the association between the function and the binding of RBPs has not been systematically studied. In this issue, Lin and Ouyang present a large-scale analysis on the functional targets of human RBPs based on the enhanced CLIP-seq datasets. Their study uncovers that the translation termination site and the 3′ un [Detail] ...

Select all

REVIEW

Applications of probability and statistics in cancer genomics

Xiaotu Ma, Sasi Arunachalam, Yanling Liu

2020, 8(2): 95-108. https://doi.org/10.1007/s40484-020-0203-8

Download PDF

Background: The past decade has witnessed a rapid progress in our understanding of the genetics of cancer and its progression. Probabilistic and statistical modeling played a pivotal role in the discovery of general patterns from cancer genomics datasets and continue to be of central importance for personalized medicine.

Results: In this review we introduce cancer genomics from a probabilistic and statistical perspective. We start from (1) functional classification of genes into oncogenes and tumor suppressor genes, then (2) demonstrate the importance of comprehensive analysis of different mutation types for individual cancer genomes, followed by (3) tumor purity analysis, which in turn leads to (4) the concept of ploidy and clonality, that is next connected to (5) tumor evolution under treatment pressure, which yields insights into cancer drug resistance. We also discuss future challenges including the non-coding genomic regions, integrative analysis of genomics and epigenomics, as well as early cancer detection.

Conclusion: We believe probabilistic and statistical modeling will continue to play important roles for novel discoveries in the field of cancer genomics and personalized medicine.

REVIEW

Prediction and differential analysis of RNA secondary structure

Bo Yu, Yao Lu, Qiangfeng Cliff Zhang, Lin Hou

2020, 8(2): 109-118. https://doi.org/10.1007/s40484-020-0205-6

Download PDF

Background: RNA structure is the crucial basis for RNA function in various cellular processes. Over the last decade, high throughput structure profiling (SP) experiments have brought enormous insight into RNA secondary structure.
Results: In this review, we first provide an overview of approaches for RNA secondary structure prediction, including free energy-based algorithms and comparative sequence analysis. Then we introduce SP technologies, databases to document SP data, and pipelines/algorithms to normalize and interpret SP data. Computational frameworks that incorporate SP data in RNA secondary structure prediction are also presented.
Conclusions: We finally discuss potential directions for improvement in the prediction and differential analysis of RNA secondary structure.

RESEARCH ARTICLE

Large-scale analysis of the position-dependent binding and regulation of human RNA binding proteins

Jianan Lin, Zhengqing Ouyang

2020, 8(2): 119-129. https://doi.org/10.1007/s40484-020-0206-5

Download PDF

Background: RNA binding proteins (RBPs) play essential roles in the regulation of RNA metabolism. Recent studies have disclosed that RBPs achieve their functions via binding to their targets in a position-dependent pattern on RNAs. However, few studies have systematically addressed the associations between the RBP’s functions and their positional binding preferences.
Methods: Here, we present large-scale analyses on the functional targets of human RBPs by integrating the enhanced cross-linking and immunoprecipitation followed by sequencing (eCLIP-seq) datasets and the shRNA knockdown followed by RNA-seq datasets that are deposited in the integrated ENCyclopedia of DNA Elements in the human genome (ENCODE) data portal.
Results: We found that (1) binding to the translation termination site and the 3′ untranslated region is important to most human RBPs in the RNA decay regulation; (2) RBPs’ binding and regulation follow a cell-type specific pattern.
Conclusions: These analysis results show the strong relationship between the binding position and the functions of RBPs, which provides novel insights into the RBPs’ regulation mechanisms.

RESEARCH ARTICLE

A pan-cancer integrative pathway analysis of multi-omics data

Henry Linder, Yuping Zhang

2020, 8(2): 130-142. https://doi.org/10.1007/s40484-019-0185-6

Download PDF

Background: Multi-view -omics datasets offer rich opportunities for integrative analysis across genomic, transcriptomic, and epigenetic data platforms. Statistical methods are needed to rigorously implement current research on functional biology, matching the complex dynamics of systems genomic datasets.
Methods: We apply imputation for missing data and a structural, graph-theoretic pathway model to a dataset of 22 cancers across 173 signaling pathways. Our pathway model integrates multiple data platforms, and we test for differential activation between cancerous tumor and healthy tissue populations.
Results: Our pathway analysis reveals significant disturbance in signaling pathways that are known to relate to oncogenesis. We identify several pathways that suggest new research directions, including the Trk signaling and focal adhesion kinase activation pathways in sarcoma.
Conclusions: Our integrative analysis confirms contemporary research findings, which supports the validity of our findings. We implement an interactive data visualization for exploration of the pathway analyses, which is available online for public access.

RESEARCH ARTICLE

Confidence intervals for Markov chain transition probabilities based on next generation sequencing reads data

Lin Wan, Xin Kang, Jie Ren, Fengzhu Sun

2020, 8(2): 143-154. https://doi.org/10.1007/s40484-020-0200-y

Download PDF

Background: Markov chains (MC) have been widely used to model molecular sequences. The estimations of MC transition matrix and confidence intervals of the transition probabilities from long sequence data have been intensively studied in the past decades. In next generation sequencing (NGS), a large amount of short reads are generated. These short reads can overlap and some regions of the genome may not be sequenced resulting in a new type of data. Based on NGS data, the transition probabilities of MC can be estimated by moment estimators. However, the classical asymptotic distribution theory for MC transition probability estimators based on long sequences is no longer valid.
Methods: In this study, we present the asymptotic distributions of several statistics related to MC based on NGS data. We show that, after scaling by the effective coverage d defined in a previous study by the authors, these statistics based on NGS data approximate to the same distributions as the corresponding statistics for long sequences.
Results: We apply the asymptotic properties of these statistics for finding the theoretical confidence regions for MC transition probabilities based on NGS short reads data. We validate our theoretical confidence intervals using both simulated data and real data sets, and compare the results with those by the parametric bootstrap method.
Conclusions: We find that the asymptotic distributions of these statistics and the theoretical confidence intervals of transition probabilities based on NGS data given in this study are highly accurate, providing a powerful tool for NGS data analysis.

RESEARCH ARTICLE

A censored-Poisson model based approach to the analysis of RNA-seq data

Xing Chen, Yinglei Lai

2020, 8(2): 155-171. https://doi.org/10.1007/s40484-020-0208-3

Download PDF

Background: With the recent advance of sequencing technology, the collection of RNA expression (RNA-seq) data has been growing rapidly. RNA-seq data are statistically count-type measurements. Poisson distribution is a basic probability distribution for modeling count-type data. With Poisson regression models, various experimental factors, GC content as well as alternative splicing isoforms can be flexibly considered in RNA-seq data analysis. Due to the biochemical and technical limitations of sequencing technology, the biases among RNA-seq data have been recognized.
Methods: In this study, an artificial censoring approach has been proposed to an isoform-specific Poisson regression model for analyzing RNA-seq data. Low expression values can be grouped (censored) into one probability category, and high expression values can also be grouped (censored) into another probability category. We have implemented the related Newton-Raphson numeric computing procedure to achieve the maximum likelihood estimation for our censored-Poisson regression model. The related mathematical simplifications have been derived for the consideration of stable and convenient numerical computing.
Results: The advantages of our artificial censoring approach have been demonstrated in both simulation studies and application analysis of experimental data.
Conclusions: Our proposed artificial censoring approach allows us to focus on the majority of data. As the extreme values (tails) of data are artificially censored, more efficient analysis results can be obtained, even from relatively simple Poisson regression models. Our proposed artificial censoring approach can certainly be considered for other well-developed models or methods for RNA-seq data analysis.

PROTOCOL AND TUTORIAL

Counting single cells and computing their heterogeneity: from phenotypic frequencies to mean value of a quantitative biomarker

Hong Qian, Yu-Chen Cheng

2020, 8(2): 172-176. https://doi.org/10.1007/s40484-020-0196-3

Download PDF

This tutorial presents a mathematical theory that relates the probability of sample frequencies, of M phenotypes in an isogenic population of N cells, to the probability distribution of the sample mean of a quantitative biomarker, when the N is very large. An analogue to the statistical mechanics of canonical ensemble is discussed.

MEETING REPORT

International Workshop on Applications of Probability and Statistics to Biology, July 11--13, 2019 ––In Honor of Professor Minping Qian’s 80th Birthday

Minghua Deng, Jianfeng Feng, Hong Qian, Lin Wan, Fengzhu Sun

2020, 8(2): 177-186. https://doi.org/10.1007/s40484-019-0182-9

Download PDF

About the journal

Aims & scope

Editorial board

Abstracting / indexing

Cover gallery

Contact us

Browse

Latest issue

All volumes and issues

Collections

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Open access

Compliance with Ethical Requirement

Guidelines for authors

Classifications via endnote

Guidelines for reviewers

Please choose a citation manager