Dec 2017, Volume 5 Issue 4

Cover illustration

  • Precision medicine attempts to tailor the right therapy for the right patient. However, it is still lack of powerful computational methods for an optimum target-drug recommendation. A novelty computation method for Precision Medicine Target-Drug Selection (PMTDS) is developed by Vasudevaraja et al. in this issue. It can priority the pair target-drug for individual style treatment of cancer based on genetic interaction networks and multi-omics data. Large-scale validation on a [Detail] ...

  • Select all
    Zhaohui S. Qin
    Lu Wang, Lipi Acharya, Changxin Bai, Dongxiao Zhu

    Background: Precision medicine approach holds great promise to tailored diagnosis, treatment and prevention. Individuals can be vastly different in their genomic information and genetic mechanisms hence having unique transcriptomic signatures. The development of precision medicine has demanded moving beyond DNA sequencing (DNA-Seq) to much more pointed RNA-sequencing (RNA-Seq) [Cell, 2017, 168: 584–599].

    Results: Here we conduct a brief survey on the recent methodology development of transcriptome assembly approach using RNA-Seq.

    Conclusions: Since transcriptomes in human disease are highly complex, dynamic and diverse, transcriptome assembly is playing an increasingly important role in precision medicine research to dissect the molecular mechanisms of the human diseases.

    Jie Zheng, Huan Li, Qingzhi Liu, Yongqun He

    Background: The community-based Ontology of Biological and Clinical Statistics (OBCS) represents and standardizes biological and clinical data and statistical methods.

    Methods: Both OBCS and the Vaccine Ontology (VO) were used to ontologically model various components and relations in a typical host response to vaccination study. Such a model was then applied to represent and compare three microarray studies of host responses to the yellow fever vaccine YF-17D. A literature meta-analysis was then conducted to survey yellow fever vaccine response papers and summarize statistical methods, using OBCS.

    Results: A general ontological model was developed to identify major components in a typical host response to vaccination. Our ontology modeling of three similar studies identified common and different components which may contribute to varying conclusions. Although these three studies all used the same vaccine, human blood samples, similar sample collection time post vaccination, and microarray assays, statistically differentially expressed genes and associated gene functions differed, likely due to the differences in specific variables (e.g., microarray type and human variations). Our manual annotation of 95 papers in human responses to yellow fever vaccines identified 38 data analysis methods. These statistical methods were consistently represented and classified with OBCS. Eight statistical methods not available in existing ontologies were added to OBCS.

    Conclusions: The study represents the first single use case of applying OBCS ontology to standardize, integrate, and use biomedical data and statistical methods. Our ontology-based meta-analysis showed that different experimental results might be due to different experimental assays and conditions, sample variations, and data analysis methods.

    Saurav Mallik, Zhongming Zhao

    Background: Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles.

    Methods: We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that follows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, “mRMR” is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch t-test (parametric test) and Shrink t-test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, “RANWAR” to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for in silico validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel.

    Results: The novel markers of AML are {ABCB11↑ ∪ KRT17↓} (i.e., ABCB11 as up-regulated, & KRT17 as down-regulated), and {AP1S1- ∪ KRT17↓ ∪ NEIL2- ∪ DYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1¶ ∪ APBA2‡ ∪ C4orf31‡} (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper-methylated).

    Conclusion: The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease.

    Sha Cao, Yi Zhou, Yue Wu, Tianci Song, Burair Alsaihati, Ying Xu

    Background: We aim to address one question: do cancer vs. normal tissue cells execute their transcription regulation essentially the same or differently, and why?

    Methods: We utilized an integrated computational study of cancer epigenomes and transcriptomes of 10 cancer types, by using penalized linear regression models to evaluate the regulatory effects of DNA methylations on gene expressions.

    Results: Our main discoveries are: (i) 56 genes have their expressions consistently regulated by DNA methylation specifically in cancer, which enrich pathways associated with micro-environmental stresses and responses, particularly oxidative stress; (ii) the level of involvement by DNA methylation in transcription regulation increases as a cancer advances for majority of the cancer types examined; (iii) transcription regulation in cancervs. control tissue cells are substantially different, with the former being largely done through direct DNA methylation and the latter mainly done via transcriptional factors; (iv) the altered DNA methylation landscapes in cancervs. control are predominantly accomplished by DNMT1, TET3 and CBX2, which are predicted to be the result of persistent stresses present in the intracellular and micro-environments of cancer cells, which is consistent with the general understanding about epigenomic functions.

    Conclusions: Our integrative analyses discovered that a large class of genes is regulated via direct DNA methylation of the genes in cancer, comparing to TFs in normal cells. Such genes fall into a few stress and response pathways. As a cancer advances, the level of involvement by direct DNA methylation in transcription regulation increases for majority of the cancer types examined.

    Yiyi Liu, Hongyu Zhao

    Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest.

    Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features.

    Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases.

    Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package “viRandomForests” based on the original R package “randomForest” and it can be freely downloaded from

    Jiao Chen, Dongxiao Zhu, Yanni Sun

    Background: MicroRNAs (miRNAs) regulate target gene expression at post-transcriptional level. Intense research has been conducted for miRNA identification and the target finding. However, much less is known about the transcriptional regulation of miRNA genes themselves. Recently, a special group of pre-miRNAs that are produced directly by transcription without Drosha processing were validated in mouse, indicating the complexity of miRNA biogenesis.

    Methods: In this work, we detect clusters of aligned Cap-seq reads to find the transcription start sites (TSSs) for intergenic miRNAs and study their transcriptional regulation in Caenorhabditis elegans and mouse.

    Results: In both species, we have identified a class of special pre-miRNAs whose 5′ ends are capped, and are most probably generated directly by transcription. Furthermore, we distinguished another class of special pre-miRNAs that are 5′-capped but are also part of longer primary miRNAs, suggesting they may have more than one transcription mechanism. We detected multiple cap reads peaks within miRNA clusters in C. elegans. We surmised that the miRNAs in a cluster may either be transcribed independently or be re-capped during the microprocessor cleavage process. We also observed that H3K4me3 and Pol II are enriched at those identified miRNA TSSs.

    Conclusions: The Cap-seq datasets enabled us to annotate the primary TSSs for miRNA genes with high resolution. Special class of 5′-capped pre-miRNAs have been identified in both C. elegans and mouse. The capping patter of miRNAs in a cluster indicate that clustered miRNA transcripts probably undergo a re-capping procedure during the microprocessor cleavage process.

    Weiming Zhang, Debashis Ghosh

    Background: Properly adjusting for unmeasured confounders is critical for health studies in order to achieve valid testing and estimation of the exposure’s causal effect on outcomes. The instrumental variable (IV) method has long been used in econometrics to estimate causal effects while accommodating the effect of unmeasured confounders. Mendelian randomization (MR), which uses genetic variants as the instrumental variables, is an application of the instrumental variable method to biomedical research fields, and has become popular in recent years. One often-used estimator of causal effects for instrumental variables and Mendelian randomization is the two-stage least square estimator (TSLS). The validity of TSLS relies on the accurate prediction of exposure based on IVs in its first stage.

    Results: In this note, we propose to model the link between exposure and genetic IVs using the least-squares kernel machine (LSKM). Some simulation studies are used to evaluate the feasibility of LSKM in TSLS setting.

    Conclusions: Our results show that LSKM based on genotype score or genotype can be used effectively in TSLS. It may provide higher power when the association between exposure and genetic IVs is nonlinear.

    Varshini Vasudevaraja, Jamie Renbarger, Ridhhi Girish Shah, Garrett Kinnebrew, Murray Korc, Limei Wang, Yang Huo, Enze Liu, Lang Li, Lijun Cheng

    Background: Precision medicine attempts to tailor the right therapy for the right patient. Recent progress in large-scale collection of patents’ tumor molecular profiles in The Cancer Genome Atlas (TCGA) provides a foundation for systematic discovery of potential drug targets specific to different types of cancer. However, we still lack powerful computational methods to effectively integrate multiple omics data and protein-protein interaction network technology for an optimum target and drug recommendation for an individual patient.

    Methods: In this study, a computation method, Precision Medicine Target-Drug Selection (PMTDS) based on genetic interaction networks is developed to select the optimum targets and associated drugs for precision medicine style treatment of cancer. The PMTDS system includes three parts: a personalized medicine knowledgebase for each cancer type, a genetic interaction network-based algorithm and a single patient molecular profiles. The knowledgebase integrates cancer drugs, drug-target databases and gene biological pathway networks. The molecular profiles of each tumor consists of DNA copy number alteration, gene mutation, and tumor gene expression variation compared to its adjacent normal tissue.

    Results: The novel integrated PMTDS system is applied to select candidate target-drug pairs for 178 TCGA pancreatic adenocarcinoma (PDAC) tumors. The experiment results show known drug targets (EGFR, IGF1R, ERBB2, NR1I2 and AKR1B1) of PDAC treatment are identified, which provides important evidence of the PMTDS algorithm’s accuracy. Other potential targets PTK6, ATF, SYK are, also, recommended for PDAC. Further validation is provided by comparison of selected targets with, both, cell line molecular profiles from the Cancer Cell Line Encyclopedia (CCLE) and drug response data from the Cancer Therapeutics Response Portal (CTRP). Results from experimental analysis of forty six individual pancreatic cancer samples show that drugs selected by PMTDS have more sample-specific efficacy than the current clinical PDAC therapies.

    Conclusions: A novelty target and drug priority algorithm PMTDS is developed to identify optimum target-drug pairs by integrating the knowledgebase base with a single patient’s genomics. The PMTDS system provides an accurate and reliable source for target and off-label drug selection for precision cancer medicine.