Cover illustration
Transcription factors (TF) regulate the expression level of targeted genes and furtherly effect biological functions. Song et al. developed EpiFIT, an online tool to infer functions of TF using sequence and epigenetic data. Through a series of examination experiments, they verified that EpiFIT can precisely interpret TF functions and build distal TF binding sites ? regulated genes associations with the help of epigenetic information. In a word, EpiFIT is a powerful tool for a[Detail] ...
Background: Traditionally, scientists studied microbiology through the manner of batch cultures, to conclude the dynamics or outputs by averaging all individuals. However, as the researches go further, the heterogeneities among the individuals have been proven to be crucial for the population dynamics and fates.
Results: Due to the limit of technology, single-cell analysis methods were not widely used to decipher the inherent connections between individual cells and populations. Since the early decades of this century, the rapid development of microfluidics, fluorescent labelling, next-generation sequencing, and high-resolution microscopy have speeded up the development of single-cell technologies and further facilitated the applications of these technologies on bacterial analysis.
Conclusions: In this review, we summarized the recent processes of single-cell technologies applied in bacterial analysis in terms of intracellular characteristics, cell physiology dynamics, and group behaviors, and discussed how single-cell technologies could be more applicable for future bacterial researches.
Background: Arsenic has a broad anti-cancer ability against hematologic malignancies and solid tumors. To systematically understand the biological functions of arsenic, we need to identify arsenic-binding proteins in human cells. However, due to lack of effective theoretical tools and experimental methods, only a few arsenic-binding proteins have been identified.
Methods: Based on the crystal structure of ArsM, we generated a single mutation free energy profile for arsenic binding using free energy perturbation methods. Multiple validations provide an indication that our computational model has the ability to predict arsenic-binding proteins with desirable accuracy. We subsequently apply this computational model to scan the entire human genome to identify all the potential arsenic-binding proteins.
Results: The computationally predicted arsenic-binding proteins show a wide range of biological functions, especially in the signaling transduction pathways. In the signaling transduction pathways, arsenic directly binds to the key factors (e.g., Notch receptors, Notch ligands, Wnt family proteins, TGF-beta, and their interacting proteins) and results in significant inhibitions on their enzymatic activities, further having a crucial impact on the related signaling pathways.
Conclusions: Arsenic has a significant impact on signaling transduction in cells. Arsenic binding to proteins can lead to dysfunctions of the target proteins, having crucial impacts on both signaling pathway and gene transcription. We hope that the computationally predicted arsenic-binding proteins and the functional analysis can provide a novel insight into the biological functions of arsenic, revealing a mechanism for the broad anti-cancer of arsenic.
Background: Common variable immunodeficiency (CVID), the most prevalent form of primary immunodeficiency (PID), is characterized by hypogammaglobulinemia and recurrent infections. Understanding protein-protein interaction (PPI) networks of CVID genes and identifying candidate CVID genes are critical steps in facilitating the early diagnosis of CVID. Here, the aim was to investigate PPI networks of CVID genes and identify candidate CVID genes using computation techniques.
Methods: Network density and biological distance were used to study PPI data for CVID and PID genes obtained from the STRING database. Gene expression data of patients with CVID were obtained from the Gene Expression Omnibus, and then Pearson’s correlation coefficient, a PPI database, and Kyoto Encyclopedia of Genes and Genomes were used to identify candidate CVID genes. We then evaluated our predictions and identified differentially expressed CVID genes.
Results: The majority of CVID genes are characterized by a high network density and small biological distance, whereas most PID genes are characterized by a low network density and large biological distance, indicating that CVID genes are more functionally similar to each other and closely interact with one other compared with PID genes. Subsequently, we identified 172 CVID candidate genes that have similar biological functions to known CVID genes, and eight genes were recently reported as CVID-related genes. MYC, a candidate gene, was down-regulated in CVID duodenal biopsies, but up-regulated in blood samples compared with levels in healthy controls.
Conclusion: Our findings will aid in a better understanding of the complex of CVID genes, possibly further facilitating the early diagnosis of CVID.
Background: MicroRNAs (miRNAs) are a significant type of non-coding RNAs, which usually were encoded by endogenous genes with about ~22 nt nucleotides. Accumulating biological experiments have shown that miRNAs have close associations with various human diseases. Although traditional experimental methods achieve great successes in miRNA-disease interaction identification, these methods also have some limitations. Therefore, it is necessary to develop computational method to predict miRNA-disease interactions.
Methods: Here, we propose a computational framework (MDVSI) to predict interactions between miRNAs and diseases by integrating miRNA topological similarity and functional similarity. Firstly, the CosRA index is utilized to measure miRNA similarity based on network topological feature. Then, in order to enhance the reliability of miRNA similarity, the functional similarity and CosRA similarity are integrated based on linear weight method. Further, the potential miRNA-disease associations are predicted by using recommendation method. In addition, in order to overcome limitation of recommendation method, for new disease, a new strategy is proposed to predict potential interactions between miRNAs and new disease based on disease functional similarity.
Results: To evaluate the performance of different methods, we conduct ten-fold cross validation and de novo test in experiment and compare MDVSI with two the-state-of-art methods. The experimental result shows that MDVSI achieves an AUC of 0.91, which is at least 0.012 higher than other compared methods.
Conclusions: In summary, we propose a computational framework (MDSVI) for miRNA-disease interaction prediction. The experiment results demonstrate that it outperforms other the-state-of-the-art methods. Case study shows that it can effectively identify potential miRNA-disease interactions.
Background: Traditional Chinese medicine (TCM) has been attracting lots of attentions from various disciplines recently. However, TCM is still mysterious because of its unique philosophy and theoretical thinking. Due to the lack of high quality data, understanding TCM thoroughly faces critical challenges. In this study, we introduce the Zhou Archive, a large-scale database of expert-specific Electronic Medical Records containing information about 73,000+ visits to one TCM doctor for over 35 years. Covering the full spectrum of diagnosis-treatment model behind TCM practice, the archive provides an opportunity to understand TCM from the data-driven perspective.
Methods: Processing the text data in the archive via a series of data processing steps, we transformed the semi-structured EMRs in the archive to a well-structured feature table. Based on the structured feature table obtained, a series of statistical analyses are implemented to learn principles of TCM clinical practice from the archive, including correlation analysis, enrichment analysis, embedding analysis and association pattern discovery.
Results: A structured feature table of 14,000+ features is generated at the end of the proposed data processing procedure, with a feature codebook, a term dictionary and a term-feature map as byproducts. Statistical analysis of the feature table reveals underlying principles about the diagnosis-treatment model of TCM, helping us better understand the TDM practice from a data-driven perspective.
Conclusion: Expert-specific EMRs provide opportunities to understand TCM from the data-driven perspective. Taking advantage of recent progresses on NLP for Chinese, we can process a large number of TCM EMRs efficiently to gain insights via statistical analysis.
Background: Transcription factor is one of the most important regulators in the transcriptional process. Nevertheless, the functional interpretation of transcription factors is still a main challenge due to the poor performance of methods relating to regulatory regions to genes. Epigenetic information, such as chromatin accessibility, contains genome-wide knowledge about transcription regulation and thus may shed light on the functional interpretation of transcription factors.
Methods: We propose EpiFIT (Epigenetic based Functional Interpretation of Transcription factors), a tool to infer functions of transcription factors from ChIP-seq data. Briefly, we adopt a variable distance rule to establish associations between regulatory regions and nearby genes. The associations are then filtered to ensure that the remaining regions and associated genes are co-open. Finally, GO enrichment is applied to all related genes and a ranking list of GO terms is provided as functional interpretation.
Results: We first examined the chromatin openness correlation between regulatory regions and associated genes. The correlation can help EpiFIT purify regulatory region–gene associations. By evaluating EpiFIT on a set of real data, we demonstrated that EpiFIT outperforms other existing methods for precisely interpreting transcription factor functions. We further verify the efficiency of openness in interpretation and the ability of EpiFIT to build distal region-gene associations.
Conclusion: EpiFIT is a powerful tool for interpreting the transcription factor functions. We believe EpiFIT will facilitate the functional interpretation of other regulatory elements, and thus open a new door to understanding the regulatory mechanism.
Availability: The application is freely accessible at website: bioinfo.au.tsinghua.edu.cn/openness/EpiFIT/.