Dec 2019, Volume 7 Issue 4
    

Cover illustration

  • Type III secretion system (T3SS) is a specialized protein delivery system in gram-negative bacteria, and the type III secreted effectors (T3SEs) play an important part in disease development for many plant and animal pathogens. Computational identification of T3SEs is a very challenging task in bioinformatics due to the lack of defined secretion signal and great sequence diversity. To exploit T3SE sequence information, Fu et al. employed a word embedding method to capture sem [Detail] ...


  • Select all
  • MINI REVIEW
    Jie Zheng, Ke Wang

    Deep learning is making major breakthrough in several areas of bioinformatics. Anticipating that this will occur soon for the single-cell RNA-seq data analysis, we review newly published deep learning methods that help tackle computational challenges. Autoencoders are found to be the dominant approach. However, methods based on deep generative models such as generative adversarial networks (GANs) are also emerging in this area.

  • REVIEW
    Wazim Mohammed Ismail, Etienne Nzabarushimana, Haixu Tang

    Background: The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones.

    Results: In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem.

    Conclusions: In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.

  • REVIEW
    Farzane Yahyanejad, Réka Albert, Bhaskar DasGupta

    Background: Since biological systems are complex and often involve multiple types of genomic relationships, tensor analysis methods can be utilized to elucidate these hidden complex relationships. There is a pressing need for this, as the interpretation of the results of high-throughput experiments has advanced at a much slower pace than the accumulation of data.

    Results: In this review we provide an overview of some tensor analysis methods for biological systems.

    Conclusions: Tensors are natural and powerful generalizations of vectors and matrices to higher dimensions and play a fundamental role in physics, mathematics and many other areas. Tensor analysis methods can be used to provide the foundations of systematic approaches to distinguish significant higher order correlations among the elements of a complex systems via finding ensembles of a small number of reduced systems that provide a concise and representative summary of these correlations.

  • REVIEW
    Raffaella Rizzi, Stefano Beretta, Murray Patterson, Yuri Pirola, Marco Previtali, Gianluca Della Vedova, Paola Bonizzoni

    Background: De novo genome assembly relies on two kinds of graphs: de Bruijn graphs and overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn graphs have become the dominant technical device in the last decade. Those two kinds of graphs are collectively called assembly graphs.

    Results: In this review, we discuss the most recent advances in the problem of constructing, representing and navigating assembly graphs, focusing on very large datasets. We will also explore some computational techniques, such as the Bloom filter, to compactly store graphs while keeping all functionalities intact.

    Conclusions: We complete our analysis with a discussion on the algorithmic issues of assembling from long reads (e.g., PacBio and Oxford Nanopore). Finally, we present some of the most relevant open problems in this field.

  • RESEARCH ARTICLE
    Xiaofeng Fu, Yang Yang

    Background: The type III secreted effectors (T3SEs) are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria. In particular, the pathogenesis of Gram-negative bacteria depends on the type III secreted effectors, and by injecting T3SEs into a host cell, the host cell’s immunity can be destroyed. The high diversity of T3SE sequences and the lack of defined secretion signals make it difficult to identify and predict. Moreover, the related study of the pathological system associated with T3SE remains a hot topic in bioinformatics. Some computational tools have been developed to meet the growing demand for the recognition of T3SEs and the studies of type III secretion systems (T3SS). Although these tools can help biological experiments in certain procedures, there is still room for improvement, even for the current best model, as the existing methods adopt hand-designed feature and traditional machine learning methods.

    Methods: In this study, we propose a powerful predictor based on deep learning methods, called WEDeepT3. Our work consists mainly of three key steps. First, we train word embedding vectors for protein sequences in a large-scale amino acid sequence database. Second, we combine the word vectors with traditional features extracted from protein sequences, like PSSM, to construct a more comprehensive feature representation. Finally, we construct a deep neural network model in the prediction of type III secreted effectors.

    Results: The feature representation of WEDeepT3 consists of both word embedding and position-specific features. Working together with convolutional neural networks, the new model achieves superior performance to the state-of-the-art methods, demonstrating the effectiveness of the new feature representation and the powerful learning ability of deep models.

    Conclusion: WEDeepT3 exploits both semantic information of k-mer fragments and evolutional information of protein sequences to accurately differentiate between T3SEs and non-T3SEs. WEDeepT3 is available at bcmi.sjtu.edu.cn/~yangyang/WEDeepT3.html.

  • RESEARCH ARTICLE
    Vijayakumar Subramaniyan, Reetha Sekar, Arulmozhi Praveenkumar, Rajalakshmi Selvam

    Background: Hepatitis B virus (HBV) has affected over 300 million people worldwide which causes to induce mostly liver disease and liver cancer. It is a member of the family Hepadnaviridae which is a small DNA virus with unusual characters like retroviruses. Generally, hepatoprotective drugs provoke some side effects in human beings. For the reason, this study aims to identify alternative drug molecules from the natural source of medicinal plants with smaller quantity of side effects than those conventional drugs in treating HBV.

    Methods: We developed computational methods for calculating drug and target binding resemblance using the Maestro v10.2 of Schrodinger suite. The target and ligand molecules were obtained from recognized databases. Ligand molecules of 40 phytoconstituents were retrieved from variety of plants after we executed crucial analyses such as molecular docking and absorption, distribution, metabolism, and excretion (ADME) analysis.

    Results: In the docking analysis, the natural analogues repandusinic acid showed better docking scores of –14.768 with good binding contacts. The remaining bioactive molecules corilagin, furosin, nirurin, iso-quercetin and gallocatechin also showed better docking scores.

    Conclusion: This computational analysis reveals that repandusinic acid is a suitable drug candidate for HBV. Therefore, we recommend that this analogue is suitable in further exploration using in vitro studies.

  • RESEARCH ARTICLE
    Zongliang Yue, Thanh Nguyen, Eric Zhang, Jianyi Zhang, Jake Y. Chen

    Background: In network biology researchers generate biomolecular networks with candidate genes or proteins experimentally-derived from high-throughput data and known biomolecular associations. Current bioinformatics research focuses on characterizing candidate genes/proteins, or nodes, with network characteristics, e.g., betweenness centrality. However, there have been few research reports to characterize and prioritize biomolecular associations (“edges”), which can represent gene regulatory events essential to biological processes.

    Method: We developed Weighted In-Path Edge Ranking (WIPER), a new computational algorithm which can help evaluate all biomolecular interactions/associations (“edges”) in a network model and generate a rank order of every edge based on their in-path traversal scores and statistical significance test result. To validate whether WIPER worked as we designed, we tested the algorithm on synthetic network models.

    Results: Our results showed WIPER can reliably discover both critical “well traversed in-path edges”, which are statistically more traversed than normal edges, and “peripheral in-path edges”, which are less traversed than normal edges. Compared with other simple measures such as betweenness centrality, WIPER provides better biological interpretations. In the case study of analyzing postanal pig hearts gene expression, WIPER highlighted new signaling pathways suggestive of cardiomyocyte regeneration and proliferation. In the case study of Alzheimer’s disease genetic disorder association, WIPER reports SRC:APP, AR:APP, APP:FYN, and APP:NES edges (gene-gene associations) both statistically and biologically important from PubMed co-citation.

    Conclusion: We believe that WIPER will become an essential software tool to help biologists discover and validate essential signaling/regulatory events from high-throughput biology data in the context of biological networks.

    Availability: The free WIPER API is described at discovery.informatics.uab.edu/wiper/

  • PROTOCOL AND TUTORIAL
    Hao Feng, Hao Wu

    Bisulfite sequencing (BS-seq) technology measures DNA methylation at single nucleotide resolution. A key task in BS-seq data analysis is to identify differentially methylation (DM) under different conditions. Here we provide a tutorial for BS-seq DM analysis using Bioconductor package DSS. DSS uses a beta-binomial model to characterize the sequence counts from BS-seq, and implements rigorous statistical method for hypothesis testing. It provides flexible functionalities for a variety of DM analyses.

  • MEETING REPORT
    Feng Liu, Yihan Lin, Chunmei Li, Miaomiao Tian