Cover illustration
Alternative cleavage and polyadenylation (APA) results in mRNA isoforms with different 3′ UTR lengths. Significance analysis of alternative polyadenylation using RNA-seq (SAAP-RS) is a newly developed computational method that interrogates RNA-seq reads mapped to different 3′ UTR sequences (center) and analyze differential expression of 3′ UTR isoforms. SAAP-RS reveals 3′ UTR length differences among mouse brain cells, including neurons, astrocytes, oligodendrocytes, endothel[Detail] ...
Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date.
Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations.
Conclusions: The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.
Background: Cellular non-coding RNAs are extensively modified post-transcriptionally, with more than 100 chemically distinct nucleotides identified to date. In the past five years, new sequencing based methods have revealed widespread decoration of eukaryotic messenger RNA with diverse RNA modifications whose functions in mRNA metabolism are only beginning to be known.
Results: Since most of the identified mRNA modifying enzymes are present in the nucleus, these modifications have the potential to function in nuclear pre-mRNA processing including alternative splicing. Here we review recent progress towards illuminating the role of pre-mRNA modifications in splicing and highlight key areas for future investigation in this rapidly growing field.
Conclusions: Future studies to identify which modifications are added to nascent pre-mRNA and to interrogate the direct effects of individual modifications are likely to reveal new mechanisms by which nuclear pre-mRNA processing is regulated.
Background: Our understanding of post-transcriptional gene regulation has increased exponentially with the development of robust methods to define protein-RNA interactions across the transcriptome. In this review, we highlight the evolution and successful applications of crosslinking and immunoprecipitation (CLIP) methods to interrogate protein-RNA interactions in a transcriptome-wide manner.
Results: Here, we survey the vast array of in vitro and in vivo approaches used to identify protein-RNA interactions, including but not limited to electrophoretic mobility shift assays, systematic evolution of ligands by exponential enrichment (SELEX), and RIP-seq. We particularly emphasize the advancement of CLIP technologies, and detail protocol improvements and computational tools used to analyze the output data. Importantly, we discuss how profiling protein-RNA interactions can delineate biological functions including splicing regulation, alternative polyadenylation, cytoplasmic decay substrates, and miRNA targets.
Conclusions: In summary, this review summarizes the benefits of characterizing RNA-protein networks to further understand the regulation of gene expression and disease pathogenesis. Our review comments on how future CLIP technologies can be adapted to address outstanding questions related to many aspects of RNA metabolism and further advance our understanding of RNA biology.
Background: Most eukaryotic protein-coding genes exhibit alternative cleavage and polyadenylation (APA), resulting in mRNA isoforms with different 3′ untranslated regions (3′ UTRs). Studies have shown that brain cells tend to express long 3′ UTR isoforms using distal cleavage and polyadenylation sites (PASs).
Methods: Using our recently developed, comprehensive PAS database PolyA_DB, we developed an efficient method to examine APA, named Significance Analysis of Alternative Polyadenylation using RNA-seq (SAAP-RS). We applied this method to study APA in brain cells and neurogenesis.
Results: We found that neurons globally express longer 3′ UTRs than other cell types in brain, and microglia and endothelial cells express substantially shorter 3′ UTRs. We show that the 3′ UTR diversity across brain cells can be corroborated with single cell sequencing data. Further analysis of APA regulation of 3′ UTRs during differentiation of embryonic stem cells into neurons indicates that a large fraction of the APA events regulated in neurogenesis are similarly modulated in myogenesis, but to a much greater extent.
Conclusion: Together, our data delineate APA profiles in different brain cells and indicate that APA regulation in neurogenesis is largely an augmented process taking place in other types of cell differentiation.
Background: Most intronic lariats are rapidly turned over after splicing. However, new research suggests that some introns may have additional post-splicing functions. Current bioinformatics methods used to identify lariats require a sequencing read that traverses the lariat branchpoint. This method provides precise branchpoint sequence and position information, but is limited in its ability to quantify abundance of stabilized lariat species in a given RNAseq sample. Bioinformatic tools are needed to better address these emerging biological questions.
Methods: We used an unsupervised machine learning approach on sequencing reads from publicly available ENCODE data to learn to identify and quantify lariats based on RNAseq read coverage shape.
Results: We developed ShapeShifter, a novel approach for identifying and quantifying stable lariat species in RNAseq datasets. We learned a characteristic “lariat” curve from ENCODE RNAseq data and were able to estimate abundances for introns based on read coverage. Using this method we discovered new stable introns in these samples that were not represented using the older, branchpoint-traversing read method.
Conclusions: ShapeShifter provides a robust approach towards detecting and quantifying stable lariat species.
Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data.
Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at the authors' website and the R/C++ code is available at https://github.com/liqiwei2000/BaySeqPeak.
Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods.
Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.