Jun 2018, Volume 6 Issue 2
    

Cover illustration

  • Networks are becoming more and more important common tools for representing biological systems. Decomposing such complex networks into module (or community) structure is a good way to find their underlying patterns. However, there are still few user-friendly tools to solve module detection in bipartite biological networks. BMTK is an online tool to effectively detect modules in such networks. It implements seven popular methods in a uniform platform and provides much convenie [Detail] ...


  • Select all
  • REVIEW
    Chaima Aouiche, Xuequn Shang, Bolin Chen

    Background: One of the most important and challenging issues in biomedicine and genomics is how to identify disease related genes. Datasets from high-throughput biotechnologies have been widely used to overcome this issue from various perspectives, e.g., epigenomics, genomics, transcriptomics, proteomics, metabolomics. At the genomic level, copy number variations (CNVs) have been recognized as critical genetic variations, which contribute significantly to genomic diversity. They have been associated with both common and complex diseases, and thus have a large influence on a variety of Mendelian and somatic genetic disorders.

    Results: In this review, based on a variety of complex diseases, we give an overview about the critical role of using CNVs for identifying disease related genes, and discuss on details the different high-throughput and sequencing methods applied for CNV detection. Some limitations and challenges concerning CNV are also highlighted.

    Conclusions: Reliable detection of CNVs will not only allow discriminating driver mutations for various diseases, but also helps to develop personalized medicine when integrating it with other genomic features.

  • REVIEW
    Zhi-Ping Liu

    Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research paradigm to decipher regulatory mechanisms. So far, numerous methods have been developed for reconstructing gene regulatory networks.

    Results: In this paper, we provide a review of bioinformatics methods for inferring gene regulatory network from omics data. To achieve the precision reconstruction of gene regulatory networks, an intuitive alternative is to integrate these available resources in a rational framework. We also provide computational perspectives in the endeavors of inferring gene regulatory networks from heterogeneous data. We highlight the importance of multi-omics data integration with prior knowledge in gene regulatory network inferences.

    Conclusions: We provide computational perspectives of inferring gene regulatory networks from multiple omics data and present theoretical analyses of existing challenges and possible solutions. We emphasize on prior knowledge and data integration in network inferences owing to their abilities of identifying regulatory causality.

  • RESEARCH ARTICLE
    David Skelding, Samuel F M Hart, Thejas Vidyasagar, Alexander E Pozhitkov, Wenying Shou

    Background: Multiplexed milliliter-scale chemostats are useful for measuring cell physiology under various degrees of nutrient limitation and for carrying out evolution experiments. In each chemostat, fresh medium containing a growth rate-limiting metabolite is pumped into the culturing chamber at a constant rate, while culture effluent exits at an equal rate. Although such devices have been developed by various labs, key parameters — the accuracy, precision, and operational range of flow rate — are not explicitly characterized.

    Methods: Here we re-purpose a published multiplexed culturing device to develop a multiplexed milliliter-scale chemostat. Flow rates for eight chambers can be independently controlled to a wide range, corresponding to population doubling times of 3~13 h, without the use of expensive feedback systems.

    Results: Flow rates are precise, with the maximal coefficient of variation among eight chambers being less than 3%. Flow rates are accurate, with average flow rates being only slightly below targets, i.e., 3%–6% for 13-h and 0.6%–1.0% for 3-h doubling times. This deficit is largely due to evaporation and should be correctable. We experimentally demonstrate that our device allows accurate and precise quantification of population phenotypes.

    Conclusions: We achieve precise control of cellular growth in a low-cost milliliter-scale chemostat array, and show that the achieved precision reduces the error when measuring biological processes.

  • RESEARCH ARTICLE
    Guanghui Li, Jiawei Luo, Zheng Xiao, Cheng Liang

    Background: The frequency of small subtrees in biological, social, and other types of networks could shed light into the structure, function, and evolution of such networks. However, counting all possible subtrees of a prescribed size can be computationally expensive because of their potentially large number even in small, sparse networks. Moreover, most of the existing algorithms for subtree counting belong to the subtree-centric approaches, which search for a specific single subtree type at a time, potentially taking more time by searching again on the same network.

    Methods: In this paper, we propose a network-centric algorithm (MTMO) to efficiently count k-size subtrees. Our algorithm is based on the enumeration of all connected sets of k1 edges, incorporates a labeled rooted tree data structure in the enumeration process to reduce the number of isomorphism tests required, and uses an array-based indexing scheme to simplify the subtree counting method.

    Results: The experiments on three representative undirected complex networks show that our algorithm is roughly an order of magnitude faster than existing subtree-centric approaches and base network-centric algorithm which does not use rooted tree, allowing for counting larger subtrees in larger networks than previously possible. We also show major differences between unicellular and multicellular organisms. In addition, our algorithm is applied to find network motifs based on pattern growth approach.

    Conclusions: A network-centric algorithm which allows for a faster counting of non-induced subtrees is proposed. This enables us to count larger motif in larger networks than previously.

  • RESEARCH ARTICLE
    Morihiro Hayashida, Noriyuki Okada, Mayumi Kamada, Hitoshi Koyano

    Background: For understanding biological cellular systems, it is important to analyze interactions between protein residues and RNA bases. A method based on conditional random fields (CRFs) was developed for predicting contacts between residues and bases, which receives multiple sequence alignments for given protein and RNA sequences, respectively, and learns the model with many parameters involved in relationships between neighboring residue-base pairs by maximizing the pseudo likelihood function.

    Methods: In this paper, we proposed a novel CRF-based model with more complicated dependency relationships between random variables than the previous model, but which takes less parameters for the sake of avoidance of overfitting to training data.

    Results: We performed cross-validation experiments for evaluating the proposed model, and took the average of AUC (area under receiver operating characteristic curve) scores. The result suggests that the proposed CRF-based model without using L1-norm regularization (lasso) outperforms the existing model with and without the lasso under several input observations to CRFs.

    Conclusions: We proposed a novel stochastic model for predicting protein-RNA residue-base contacts, and improved the prediction accuracy in terms of the AUC score. It implies that more dependency relationships in a CRF could be controlled by less parameters.

  • RESEARCH ARTICLE
    Yongxiao Yang, Wei Wang, Yuan Lou, Jianxin Yin, Xinqi Gong

    Background: Protein-protein interactions are essential to many biological processes. The binding site information of protein-protein complexes is extremely useful to obtain their structures from biochemical experiments. Geometric description of protein structures is the precondition of protein binding site prediction and protein-protein interaction analysis. The previous description of protein surface residues is incomplete, and little attention are paid to the implication of residue types for binding site prediction.

    Methods: Here, we found three new geometric features to characterize protein surface residues which are very effective for protein-protein interface residue prediction. The new features and several commonly used descriptors were employed to train millions of residue type-nonspecific or specific protein binding site predictors.

    Results: The amino acid type-specific predictors are superior to the models without distinction of amino acid types. The performances of the best predictors are much better than those of the sophisticated methods developed before.

    Conclusions: The results demonstrate that the geometric properties and amino acid types are very likely to determine if a protein surface residue would become an interface one when the protein binds to its partner.

  • METHODOLOGY ARTICLE
    Shansong Liu, Kui Hua, Sijie Chen, Xuegong Zhang

    Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics software and for investigating influences of various factors on the data. Compared to data from real microbiome samples or from defined microbial mock community, simulated data with proper computational models are better for the purpose as they provide more flexibility for controlling multiple factors.

    Methods: We developed a non-uniform metagenomic sequencing simulation system (nuMetaSim) that is capable of mimicking various factors in real metagenomic sequencing to reflect multiple properties of real data with customizable parameter settings.

    Results: We generated 9 comprehensive metagenomic datasets with different composition complexity from of 203 bacterial genomes and 2 archaeal genomes related with human intestine system.

    Conclusion: The data can serve as benchmarks for comparing performance of different methods at different situations, and the software package allows users to generate simulation data that can better reflect the specific properties in their scenarios.

  • SOFTWARE ARTICLE
    Bei Wang, Jinyu Chen, Shihua Zhang

    Background: Module detection is widely used to analyze and visualize biological networks. A number of methods and tools have been developed to achieve it. Meanwhile, bipartite module detection is also very useful for mining and analyzing bipartite biological networks and a few methods have been developed for it. However, there is few user-friendly toolkit for this task.

    Methods: To this end, we develop an online web toolkit BMTK, which implements seven existing methods.

    Results: BMTK provides a uniform operation platform and visualization function, standardizes input and output format, and improves algorithmic structure to enhance computing speed. We also apply this toolkit onto a drug-target bipartite network to demonstrate its effectiveness.

    Conclusions: BMTK will be a powerful tool for detecting bipartite modules in diverse bipartite biological networks.

    Availability: The web application is freely accessible at the website of Zhang lab.