Cover illustration
Sequence-specific transcription factors establish a diverse network of protein:DNA contacts to recognize target sites in the genome and fine-tune regulatory output. Closely related proteins within sub-families (e.g., OmpR-family paralogs, profiled in this issue) possess substantial structural and functional similarities, but key “specificity-determining residues” (SDRs) can introduce novel base preferences and global binding modes. This image displays the newly derived bindin[Detail] ...
Background: Self-sustained oscillations are a ubiquitous and vital phenomenon in living systems. From primitive single-cellular bacteria to the most sophisticated organisms, periodicities have been observed in a broad spectrum of biological processes such as neuron firing, heart beats, cell cycles, circadian rhythms, etc. Defects in these oscillators can cause diseases from insomnia to cancer. Elucidating their fundamental mechanisms is of great significance to diseases, and yet challenging, due to the complexity and diversity of these oscillators.
Results: Approaches in quantitative systems biology and synthetic biology have been most effective by simplifying the systems to contain only the most essential regulators. Here, we will review major progress that has been made in understanding biological oscillators using these approaches. The quantitative systems biology approach allows for identification of the essential components of an oscillator in an endogenous system. The synthetic biology approach makes use of the knowledge to design the simplest, de novo oscillators in both live cells and cell-free systems. These synthetic oscillators are tractable to further detailed analysis and manipulations.
Conclusion: With the recent development of biological and computational tools, both approaches have made significant achievements.
Background: Developmental patterning is highly reproducible and accurate at the single-cell level during fly embryogenesis despite the gene expression noise and external perturbations such as the variation of the embryo length, temperature and genes. To reveal the underlying mechanism, it is very important to characterize the noise transmission during the dynamic pattern formation. Two hypotheses have been proposed. The “channel” scenario requires a highly reproducible input and an accurate interpretation by downstream genes. In contrast, the “filter” scenario proposes a noisy input and a noise filter via the cross-regulation of the downstream network. It has been under great debates which scenario the fly embryogenesis follows.
Results: The first 3-h developmental patterning of fly embryos is orchestrated by a hierarchical segmentation gene network, which rewires upon the maternal to zygotic transition. Starting from the highly reproducible maternal gradients, the positional information is refined to the single-cell precision through the highly dynamical evolved zygotic gene expression profiles. Thus the fly embryo development might strictly fit into neither the originally proposed “filter” nor “channel” scenario. The controversy that which scenario the fly embryogenesis follows could be further clarified by combining quantitative measurements and modeling.
Conclusions: Fly embryos have become one of the perfect model systems for quantitative systems biology studies. The underlying mechanism discovered from fly embryogenesis will deepen our understanding of the noise control of the gene network, facilitate searching for more efficient and safer methods for cell programming and reprogramming, and have the great potential for tissue engineering and regenerative medicine.
Background: The number of biological Knowledge bases/databases storing metabolic pathway information and models has been growing rapidly. These resources are diverse in the type of information/data, the analytical tools, and objectives. Here we present a review of the most popular metabolic pathway databases and model repositories, focusing on their scope, content including reactions, enzymes, compounds, and genes, and applicability. The review aims to help researchers choose a suitable database or model repository according to the information and data required, by providing an insight look of each pathway resource.
Results: Four pathways databases and three model repositories were selected on the basis of popularity and diversity. Our review showed that the pathway resources vary in many aspects, such as their scope, content, access to data and the tools. In addition, inconsistencies have been observed in nomenclature and representation of database entities. The three model repositories reviewed do not offer a brief description of the models’ characteristics such as simulation conditions.
Conclusions: The inconsistencies among the databases in representing their contents may hamper the maximal use of the knowledge accumulated in these databases in particular and the area of systems biology at large. Therefore, it is strongly recommended that the database creators and the metabolic network models developers should follow international standards for the nomenclature of reactions and metabolites. Besides, computationally generated models that could be obtained from model repositories should be utilized with manual curations as they lack some important components that are necessary for full functionality of the models.
Background: Gene co-expression and differential co-expression analysis has been increasingly used to study co-functional and co-regulatory biological mechanisms from large scale transcriptomics data sets.
Methods: In this study, we develop a nonparametric approach to identify hub genes and modules in a large co-expression network with low computational and memory cost, namely MRHCA.
Results: We have applied the method to simulated transcriptomics data sets and demonstrated MRHCA can accurately identify hub genes and estimate size of co-expression modules. With applying MRHCA and differential co-expression analysis to E. coli and TCGA cancer data, we have identified significant condition specific activated genes in E. coli and distinct gene expression regulatory mechanisms between the cancer types with high copy number variation and small somatic mutations.
Conclusion: Our analysis has demonstrated MRHCA can (i) deal with large association networks, (ii) rigorously assess statistical significance for hubs and module sizes, (iii) identify co-expression modules with low associations, (iv) detect small and significant modules, and (v) allow genes to be present in more than one modules, compared with existing methods.
Background: Developing appropriate computational tools to distill biological insights from large-scale gene expression data has been an important part of systems biology. Considering that gene relationships may change or only exist in a subset of collected samples, biclustering that involves clustering both genes and samples has become in-creasingly important, especially when the samples are pooled from a wide range of experimental conditions.
Methods: In this paper, we introduce a new biclustering algorithm to find subsets of genomic expression features (EFs) (e.g., genes, isoforms, exon inclusion) that show strong “group interactions” under certain subsets of samples. Group interactions are defined by strong partial correlations, or equivalently, conditional dependencies between EFs after removing the influences of a set of other functionally related EFs. Our new biclustering method, named SCCA-BC, extends an existing method for group interaction inference, which is based on sparse canonical correlation analysis (SCCA) coupled with repeated random partitioning of the gene expression data set.
Results: SCCA-BC gives sensible results on real data sets and outperforms most existing methods in simulations. Software is available at https://github.com/pimentel/scca-bc.
Conclusions: SCCA-BC seems to work in numerous conditions and the results seem promising for future extensions. SCCA-BC has the ability to find different types of bicluster patterns, and it is especially advantageous in identifying a bicluster whose elements share the same progressive and multivariate normal distribution with a dense covariance matrix.
Background: Sequence-specific binding by transcription factors (TFs) plays a significant role in the selection and regulation of target genes. At the protein:DNA interface, amino acid side-chains construct a diverse physicochemical network of specific and non-specific interactions, and seemingly subtle changes in amino acid identity at certain positions may dramatically impact TF:DNA binding. Variation of these specificity-determining residues (SDRs) is a major mechanism of functional divergence between TFs with strong structural or sequence homology.
Methods: In this study, we employed a combination of high-throughput specificity profiling by SELEX and Spec-seq, structural modeling, and evolutionary analysis to probe the binding preferences of winged helix-turn-helix TFs belonging to the OmpR sub-family in Escherichia coli.
Results: We found that E. coli OmpR paralogs recognize tandem, variably spaced repeats composed of “GT-A” or “GCT”-containing half-sites. Some divergent sequence preferences observed within the “GT-A” mode correlate with amino acid similarity; conversely, “GCT”-based motifs were observed for a subset of paralogs with low sequence homology. Direct specificity profiling of a subset of OmpR homologues (CpxR, RstA, and OmpR) as well as predicted “SDR-swap” variants revealed that individual SDRs may impact sequence preferences locally through direct contact with DNA bases or distally via the DNA backbone.
Conclusions: Overall, our work provides evidence for a common structural “code” for sequence-specific wHTH-DNA interactions, and demonstrates that surprisingly modest residue changes can enable recognition of highly divergent sequence motifs. Further examination of SDR predictions will likely reveal additional mechanisms controlling the evolutionary divergence of this important class of transcriptional regulators.
Background: Clinical studies and genetic analyses have revealed that juvenile myelomonocytic leukemia (JMML) is caused by somatic and/or germline mutations of genes involved in the RAS/MAPK signalling pathway. Given the vastly different clinical prognosis among individual patients that have had this disease, mutations in genes of other pathways may be involved.
Methods: In this study, we conducted whole-exome and cancer-panel sequencing analyses on a bone marrow sample from a 2-year old juvenile myelomonocytic leukemia patient. We also measured the microRNA profile of the same patient’s bone marrow sample and the results were compared with the normal mature monocytic cells from the pooled peripheral blood.
Results: We identified additional novel mutations in the PI3K/AKT pathway and verified with a cancer panel targeted sequencing. We have confirmed the previously tested PTPN11 gene mutation (exon 3 181G>T) in the same sample and identified new nonsynonymous mutations in NTRK1, HMGA2, MLH3, MYH9 and AKT1 genes. Many of the microRNAs found to be differentially expressed are known to act as oncogenic MicroRNAs (onco-MicroRNAs or oncomiRs), whose target genes are enriched in the PI3K/AKT signalling pathway.
Conclusions: Our study suggests an alternative mechanism for JMML pathogenesis in addition to RAS/MAPK pathway. This discovery may provide new genetic markers for diagnosis and new therapeutic targets for JMML patients in the future.