Cover illustration
Biological studies are like fishing in the sea of information. We draw samples from the boundless sea using technologies like Hi-C, Chip-seq, RNA-seq and so on. Even with the high-throughput technologies, we could only see a small piece of the whole system. Computational methods are needed to filter the signal from noise and to obtain a complete picture of the system.
Background: Circular RNAs (circRNAs) from back-spliced exon(s) are characterized by the covalently closed loop feature with neither 5′ to 3′ polarity nor polyadenylated tail. By using specific computational approaches that identify reads mapped to back-splice junctions with a reversed genomic orientation, ten thousands of circRNAs have been recently re-identified in various cell lines/tissues and across different species. Increasing lines of evidence suggest that back-splicing is catalyzed by the canonical spliceosomal machinery and modulated by cis-elements and trans-factors.
Results: In this mini-review, we discuss our current understanding of circRNA biogenesis regulation, mainly focusing on the complex regulation of complementary sequences, especially Alus in human, on circRNA formation.
Conclusions: Back-splicing can be significantly facilitated by RNA pair formed by orientation-opposite complementary sequences that juxtapose flanking introns of circularized exon(s). RNA pair formed within individual introns competes with RNA pair formed across flanking introns in the same gene locus, leading to distinct choices for either canonical splicing or back-splicing. Multiple RNA pairs that bracket different circle-forming exons compete for alternative back-splicing selection, resulting in multiple circRNAs generated in a single gene locus.
Background: Mammalian brain are composed of a large number of specialized cell types with diverse molecular composition, functions and differentiation potentials. The application of recently developed single-cell RNA sequencing (scRNA-seq) technology in this filed has provided us new insights about this sophisticated system, deepened our understanding of the cell type diversity and led to the discovery of novel cell types.
Results: Here we review recent progresses of applying this technology on studying brain cell heterogeneity, adult neurogenesis as well as brain tumors, then we discuss some current limitations and future directions of using scRNA-seq on the investagation of nervous system.
Conclusions: We believe the application of single-cell RNA sequencing in neuroscience will accelerate the progress of big brain projects.
Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long-range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process.
Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results.
Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.
Background: Gene transcription in eukaryotic cells is collectively controlled by a large panel of chromatin associated proteins and ChIP-seq is now widely used to locate their binding sites along the whole genome. Inferring the differential binding sites of these proteins between biological conditions by comparing the corresponding ChIP-seq samples is of general interest, yet it is still a computationally challenging task.
Results: Here, we briefly review the computational tools developed in recent years for differential binding analysis with ChIP-seq data. The methods are extensively classified by their strategy of statistical modeling and scope of application. Finally, a decision tree is presented for choosing proper tools based on the specific dataset.
Conclusions: Computational tools for differential binding analysis with ChIP-seq data vary significantly with respect to their applicability and performance. This review can serve as a practical guide for readers to select appropriate tools for their own datasets.
Background: Genetic admixture refers to the process or consequence of interbreeding between two or more previously isolated populations within a species. Compared to many other evolutionary driving forces such as mutations, genetic drift, and natural selection, genetic admixture is a quick mechanism for shaping population genomic diversity. In particular, admixture results in “recombination” of genetic variants that have been fixed in different populations, which has many evolutionary and medical implications.
Results: However, it is challenging to accurately reconstruct population admixture history and to understand of population admixture dynamics. In this review, we provide an overview of models, methods, and tools for ancestry inference and admixture analysis.
Conclusions: Many methods and tools used for admixture analysis were originally developed to analyze human data, but these methods can also be directly applied and/or slightly modified to study non-human species as well.
Background: Aging is a complex systems level problem that needs a systems level solution. However, system models of aging and longevity, although urgently needed, are still lacking, largely due to the paucity of conceptual frameworks for modeling such a complex process.
Results: We propose that aging can be viewed as a decline in system capacity, defined as the maximum level of output that a system produces to fulfill demands. Classical aging hallmarks and anti-aging strategies can be well-aligned to system capacity. Genetic variants responsible for lifespan variation across individuals or species can also be explained by their roles in system capacity. We further propose promising directions to develop systems approaches to modulate system capacity and thus extend both healthspan and lifespan.
Conclusions: The system capacity model of aging provides an opportunity to examine aging at the systems level. This model predicts that the extent to which aging can be modulated is normally limited by the upper bound of the system capacity of a species. Within such a boundary, aging can be delayed by moderately increasing an individual’s system capacity. Beyond such a boundary, increasing the upper bound is required, which is not unrealistic given the unlimited potential of regenerative medicine in the future, but it requires increasing the capacity of the whole system instead of only part of it.
Background: The increase in global population, climate change and stagnancy in crop yield on unit land area basis in recent decades urgently call for a new approach to support contemporary crop improvements. ePlant is a mathematical model of plant growth and development with a high level of mechanistic details to meet this challenge.
Results: ePlant integrates modules developed for processes occurring at drastically different temporal (10–8–106 seconds) and spatial (10–10–10 meters) scales, incorporating diverse physical, biophysical and biochemical processes including gene regulation, metabolic reaction, substrate transport and diffusion, energy absorption, transfer and conversion, organ morphogenesis, plant environment interaction, etc. Individual modules are developed using a divide-and-conquer approach; modules at different temporal and spatial scales are integrated through transfer variables. We further propose a supervised learning procedure based on information geometry to combine model and data for both knowledge discovery and model extension or advances. We finally discuss the recent formation of a global consortium, which includes experts in plant biology, computer science, statistics, agronomy, phenomics, etc. aiming to expedite the development and application of ePlant or its equivalents by promoting a new model development paradigm where models are developed as a community effort instead of driven mainly by individual labs’ effort.
Conclusions: ePlant, as a major research tool to support quantitative and predictive plant science research, will play a crucial role in the future model guided crop engineering, breeding and agronomy.
The promise that big data will revolutionize scientific discovery and technology innovation is now being widely recognized. With the explosive growth of biomedical data, life science is being transformed into a digital science in which novel insights are gained from in-depth data analysis and modeling. Extensive and innovative utilization of biomedical big data is a key to the success of precision medicine. Therefore, constructing a centralized national-level biomedical big data infrastructure becomes crucial and urgent for China. Such infrastructure should achieve superb capacity of safe data storage, standardized data processing and quality control, systematic data integration across multiple types, and in-depth data mining and effective data sharing. Full data chain service including information retrieval, knowledge discovery and technology support can be provided to data centers, research institutes and healthcare industries. Relying on Shanghai Institutes for Biological Sciences, agreements have been signed that a main node of the infrastructure will be located in Shanghai, and a backup node will be set up in Guizhou Province. After a construction period of five years, the infrastructure should greatly enhance China’s core competence in collection, interpretation and application of biomedical big data.