Background: Multiplexed milliliter-scale chemostats are useful for measuring cell physiology under various degrees of nutrient limitation and for carrying out evolution experiments. In each chemostat, fresh medium containing a growth rate-limiting metabolite is pumped into the culturing chamber at a constant rate, while culture effluent exits at an equal rate. Although such devices have been developed by various labs, key parameters — the accuracy, precision, and operational range of flow rate — are not explicitly characterized.
Methods: Here we re-purpose a published multiplexed culturing device to develop a multiplexed milliliter-scale chemostat. Flow rates for eight chambers can be independently controlled to a wide range, corresponding to population doubling times of 3~13 h, without the use of expensive feedback systems.
Results: Flow rates are precise, with the maximal coefficient of variation among eight chambers being less than 3%. Flow rates are accurate, with average flow rates being only slightly below targets, i.e., 3%–6% for 13-h and 0.6%–1.0% for 3-h doubling times. This deficit is largely due to evaporation and should be correctable. We experimentally demonstrate that our device allows accurate and precise quantification of population phenotypes.
Conclusions: We achieve precise control of cellular growth in a low-cost milliliter-scale chemostat array, and show that the achieved precision reduces the error when measuring biological processes.
Background: Module detection is widely used to analyze and visualize biological networks. A number of methods and tools have been developed to achieve it. Meanwhile, bipartite module detection is also very useful for mining and analyzing bipartite biological networks and a few methods have been developed for it. However, there is few user-friendly toolkit for this task.
Methods: To this end, we develop an online web toolkit BMTK, which implements seven existing methods.
Results: BMTK provides a uniform operation platform and visualization function, standardizes input and output format, and improves algorithmic structure to enhance computing speed. We also apply this toolkit onto a drug-target bipartite network to demonstrate its effectiveness.
Conclusions: BMTK will be a powerful tool for detecting bipartite modules in diverse bipartite biological networks.
Availability: The web application is freely accessible at http://www.zhanglabtools.net/BMTK.
Background: Gene co-expression and differential co-expression analysis has been increasingly used to study co-functional and co-regulatory biological mechanisms from large scale transcriptomics data sets.
Methods: In this study, we develop a nonparametric approach to identify hub genes and modules in a large co-expression network with low computational and memory cost, namely MRHCA.
Results: We have applied the method to simulated transcriptomics data sets and demonstrated MRHCA can accurately identify hub genes and estimate size of co-expression modules. With applying MRHCA and differential co-expression analysis to E. coli and TCGA cancer data, we have identified significant condition specific activated genes in E. coli and distinct gene expression regulatory mechanisms between the cancer types with high copy number variation and small somatic mutations.
Conclusion: Our analysis has demonstrated MRHCA can (i) deal with large association networks, (ii) rigorously assess statistical significance for hubs and module sizes, (iii) identify co-expression modules with low associations, (iv) detect small and significant modules, and (v) allow genes to be present in more than one modules, compared with existing methods.
Background: Sequence-specific binding by transcription factors (TFs) plays a significant role in the selection and regulation of target genes. At the protein:DNA interface, amino acid side-chains construct a diverse physicochemical network of specific and non-specific interactions, and seemingly subtle changes in amino acid identity at certain positions may dramatically impact TF:DNA binding. Variation of these specificity-determining residues (SDRs) is a major mechanism of functional divergence between TFs with strong structural or sequence homology.
Methods: In this study, we employed a combination of high-throughput specificity profiling by SELEX and Spec-seq, structural modeling, and evolutionary analysis to probe the binding preferences of winged helix-turn-helix TFs belonging to the OmpR sub-family in Escherichia coli.
Results: We found that E. coli OmpR paralogs recognize tandem, variably spaced repeats composed of “GT-A” or “GCT”-containing half-sites. Some divergent sequence preferences observed within the “GT-A” mode correlate with amino acid similarity; conversely, “GCT”-based motifs were observed for a subset of paralogs with low sequence homology. Direct specificity profiling of a subset of OmpR homologues (CpxR, RstA, and OmpR) as well as predicted “SDR-swap” variants revealed that individual SDRs may impact sequence preferences locally through direct contact with DNA bases or distally via the DNA backbone.
Conclusions: Overall, our work provides evidence for a common structural “code” for sequence-specific wHTH-DNA interactions, and demonstrates that surprisingly modest residue changes can enable recognition of highly divergent sequence motifs. Further examination of SDR predictions will likely reveal additional mechanisms controlling the evolutionary divergence of this important class of transcriptional regulators.
Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest.
Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features.
Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases.
Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package “viRandomForests” based on the original R package “randomForest” and it can be freely downloaded from http://zhaocenter.org/software.
Background: Precision medicine attempts to tailor the right therapy for the right patient. Recent progress in large-scale collection of patents’ tumor molecular profiles in The Cancer Genome Atlas (TCGA) provides a foundation for systematic discovery of potential drug targets specific to different types of cancer. However, we still lack powerful computational methods to effectively integrate multiple omics data and protein-protein interaction network technology for an optimum target and drug recommendation for an individual patient.
Methods: In this study, a computation method, Precision Medicine Target- Drug Selection (PMTDS) based on genetic interaction networks is developed to select the optimum targets and associated drugs for precision medicine style treatment of cancer. The PMTDS system includes three parts: a personalized medicine knowledgebase for each cancer type, a genetic interaction network-based algorithm and a single patient molecular profiles. The knowledgebase integrates cancer drugs, drug-target databases and gene biological pathway networks. The molecular profiles of each tumor consists of DNA copy number alteration, gene mutation, and tumor gene expression variation compared to its adjacent normal tissue.
Results: The novel integrated PMTDS system is applied to select candidate target-drug pairs for 178 TCGA pancreatic adenocarcinoma (PDAC) tumors. The experiment results show known drug targets (EGFR, IGF1R, ERBB2, NR1I2 and AKR1B1) of PDAC treatment are identified, which provides important evidence of the PMTDS algorithm’s accuracy. Other potential targets PTK6, ATF, SYK are, also, recommended for PDAC. Further validation is provided by comparison of selected targets with, both, cell line molecular profiles from the Cancer Cell Line Encyclopedia (CCLE) and drug response data from the Cancer Therapeutics Response Portal (CTRP). Results from experimental analysis of forty six individual pancreatic cancer samples show that drugs selected by PMTDS have more sample-specific efficacy than the current clinical PDAC therapies.
Conclusions: A novelty target and drug priority algorithm PMTDS is developed to identify optimum target-drug pairs by integrating the knowledgebase base with a single patient’s genomics. The PMTDS system provides an accurate and reliable source for target and off-label drug selection for precision cancer medicine.
Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long-range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process.
Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results.
Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.
Background: Aging is a complex systems level problem that needs a systems level solution. However, system models of aging and longevity, although urgently needed, are still lacking, largely due to the paucity of conceptual frameworks for modeling such a complex process.
Results: We propose that aging can be viewed as a decline in system capacity, defined as the maximum level of output that a system produces to fulfill demands. Classical aging hallmarks and anti-aging strategies can be well-aligned to system capacity. Genetic variants responsible for lifespan variation across individuals or species can also be explained by their roles in system capacity. We further propose promising directions to develop systems approaches to modulate system capacity and thus extend both healthspan and lifespan.
Conclusions: The system capacity model of aging provides an opportunity to examine aging at the systems level. This model predicts that the extent to which aging can be modulated is normally limited by the upper bound of the system capacity of a species. Within such a boundary, aging can be delayed by moderately increasing an individual’s system capacity. Beyond such a boundary, increasing the upper bound is required, which is not unrealistic given the unlimited potential of regenerative medicine in the future, but it requires increasing the capacity of the whole system instead of only part of it.
Background: The shortage of available organs for transplantation is the major obstacle hindering the application of regenerative medicine, and has also become the desperate problem faced by more and more patients nowadays. The recent development and application of 3D printing technique in biological research (bioprinting) has revolutionized the tissue engineering methods, and become a promising solution for tissue regeneration.
Results: In this review, we summarize the current application of bioprinting in producing tissues and organoids, and discuss the future directions and challenges of 3D bioprinting.
Conclusions: Currently, 3D bioprinting is capable to generate patient-specialized bone, cartilage, blood vascular network, hepatic unit and other simple components/tissues, yet pure cell-based functional organs are still desired.
Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the “p≫N ” problem and is also ubiquitous in other bioinformatics and computational biology fields. The “p≫N ” problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue.
Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently.
Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task — predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods.
Conclusions: The proposed eRBMs are capable of dealing with the “p≫ N” problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.