Predicting enhancer-promoter interaction from genomic sequence with deep neural networks
Shashank Singh, Yang Yang, Barnabás Póczos, Jian Ma
Predicting enhancer-promoter interaction from genomic sequence with deep neural networks
Background: In the human genome, distal enhancers are involved in regulating target genes through proximal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions.
Methods: Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given.
Results: Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes.
Conclusions: This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.
chromatin interaction / enhancer-promoter interaction / deep neural network
[1] |
Sexton, T. and Cavalli, G. (2015) The role of chromosome domains in shaping the functional genome. Cell, 160, 1049–1059
CrossRef
Pubmed
Google scholar
|
[2] |
Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O.,
CrossRef
Pubmed
Google scholar
|
[3] |
Fullwood, M. J. and Ruan, Y. (2009) ChIP-based methods for the identification of long-range chromatin interactions. J. Cell. Biochem., 107, 30–39
CrossRef
Pubmed
Google scholar
|
[4] |
Tang, Z., Luo, O. J., Li, X., Zheng, M., Zhu, J. J., Szalaj, P., Trzaskoma, P., Magalska, A., Włodarczyk, J., Ruszczycki, B.,
CrossRef
Pubmed
Google scholar
|
[5] |
Zhang, Y., Wong, C.-H., Birnbaum, R. Y., Li, G., Favaro, R., Ngan, C. Y., Lim, J., Tai, E., Poh, H. M., Wong, E.,
CrossRef
Pubmed
Google scholar
|
[6] |
Dixon, J. R., Jung, I., Selvaraj, S., Shen, Y., Antosiewicz-Bourget, J. E., Lee, A. Y., Ye, Z., Kim, A., Rajagopal, N., Xie, W.,
CrossRef
Pubmed
Google scholar
|
[7] |
Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D. U., Jung, I., Wu, H., Zhai, Y., Tang, Y.,
CrossRef
Pubmed
Google scholar
|
[8] |
Sanyal, A., Lajoie, B. R., Jain, G. and Dekker, J. (2012) The long-range interaction landscape of gene promoters. Nature, 489, 109–113
CrossRef
Pubmed
Google scholar
|
[9] |
Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, J.,
CrossRef
Pubmed
Google scholar
|
[10] |
Rao, S. S., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., Robinson, J. T., Sanborn, A. L., Machol, I., Omer, A. D., Lander, E. S.,
CrossRef
Pubmed
Google scholar
|
[11] |
Roy, S., Siahpirani, A. F., Chasman, D., Knaack, S., Ay, F., Stewart, R., Wilson, M. and Sridharan, R. (2015) A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Res., 43, 8694–8712
CrossRef
Pubmed
Google scholar
|
[12] |
Whalen, S., Truty, R. M. and Pollard, K. S. (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet., 48, 488–496
CrossRef
Pubmed
Google scholar
|
[13] |
Schreiber, J., Libbrecht, M., Bilmes, J. and Noble, W. (2018) Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. bioRxiv, 103614
|
[14] |
Zhu, Y., Chen, Z., Zhang, K., Wang, M., Medovoy, D., Whitaker, J. W., Ding, B., Li, N., Zheng, L. and Wang, W. (2016) Constructing 3D interaction maps from 1D epigenomes. Nat. Commun., 7, 10812
CrossRef
Pubmed
Google scholar
|
[15] |
Cao, Q., Anyansi, C., Hu, X., Xu, L., Xiong, L., Tang, W., Mok, M. T. S., Cheng, C., Fan, X., Gerstein, M.,
CrossRef
Pubmed
Google scholar
|
[16] |
Yang, Y., Zhang, R., Singh, S., and Ma, J. (2017) Exploiting sequence-based features for predicting enhancer-promoter interactions. Bioinformatics 33, i252–i260
CrossRef
Google scholar
|
[17] |
Friedman, J. H. (2001) Greedy function approximation: a gradient boosting machine. Ann. Stat., 29, 1189–1232
CrossRef
Google scholar
|
[18] |
Zhou, J. and Troyanskaya, O. G. (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods, 12, 931–934
CrossRef
Pubmed
Google scholar
|
[19] |
Park, Y. and Kellis, M. (2015) Deep learning for regulatory genomics. Nat. Biotechnol., 33, 825–826
CrossRef
Pubmed
Google scholar
|
[20] |
Alipanahi, B., Delong, A., Weirauch, M. T. and Frey, B. J. (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol., 33, 831–838
CrossRef
Pubmed
Google scholar
|
[21] |
Quang, D. and Xie, X. (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res., 44, e107
CrossRef
Pubmed
Google scholar
|
[22] |
Li, Y., Shi, W. and Wasserman, W. W. (2018) Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics, 19, 202
|
[23] |
Kelley, D. R., Snoek, J. and Rinn, J. L. (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res., 26, 990–999
CrossRef
Pubmed
Google scholar
|
[24] |
Zhang, S., Hu, H., Jiang, T., Zhang, L. and Zeng, J. (2017) TITER: predicting translation initiation sites by deep learning. Bioinformatics, 33, i234–i242
|
[25] |
Cuperus, J. T., Groves, B., Kuchina, A., Rosenberg, A. B., Jojic, N., Fields, S. and Seelig, G. (2017) Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res., 27, 2015–2024
CrossRef
Pubmed
Google scholar
|
[26] |
Singh, R., Lanchantin, J., Sekhon, A. and Qi, Y. (2017) Attend and predict: understanding gene regulation by selective attention on chromatin. In: Advances in Neural Infornation Processing Systems 30
|
[27] |
Zhang, S., Hu, H., Jiang, T., Zhang, L. and Zeng, J. (2017) TITER: predicting translation initiation sites by deep learning. Bioinformatics, 33, i234–i242
CrossRef
Pubmed
Google scholar
|
[28] |
Boža, V., Brejová, B., and Vinař, T. (2017) DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads. PloS one, 12, e0178751
|
[29] |
Wang, S., Sun, S., Li, Z., Zhang, R. and Xu, J. (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol., 13, e1005324
CrossRef
Pubmed
Google scholar
|
[30] |
Ching, T., Himmelstein, D. S., Beaulieu-Jones, B. K., Kalinin, A. A., Do, B. T., Way, G. P., Ferrero, E., Agapow, P.-M., Zietz, M., Hoffman, M. M.,
CrossRef
Pubmed
Google scholar
|
[31] |
Angermueller, C., Pärnamaa, T., Parts, L. and Stegle, O. (2016) Deep learning for computational biology. Mol. Syst. Biol., 12, 878
CrossRef
Pubmed
Google scholar
|
[32] |
ENCODE Project Consortium. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74
CrossRef
Pubmed
Google scholar
|
[33] |
Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M. J.,
CrossRef
Pubmed
Google scholar
|
[34] |
Kulakovskiy, I. V., Vorontsov, I. E., Yevshin, I. S., Soboleva, A. V., Kasianov, A. S., Ashoor, H., Ba-Alawi, W., Bajic, V. B., Medvedeva, Y. A., Kolpakov, F. A.,
CrossRef
Pubmed
Google scholar
|
[35] |
Xu, J., Sankaran, V. G., Ni, M., Menne, T. F., Puram, R. V., Kim, W. and Orkin, S. H. (2010) Transcriptional silencing of γ-globin by BCL11A involves long-range interactions and cooperation with SOX6. Genes Dev., 24, 783–798
CrossRef
Pubmed
Google scholar
|
[36] |
Frank, C. L., Liu, F., Wijayatunge, R., Song, L., Biegler, M. T., Yang, M. G., Vockley, C. M., Safi, A., Gersbach, C. A., Crawford, G. E.,
CrossRef
Pubmed
Google scholar
|
[37] |
Krivega, I. and Dean, A. (2017) LDB1-mediated enhancer looping can be established independent of mediator and cohesin. Nucleic Acids Res., 45, 8255–8268
CrossRef
Pubmed
Google scholar
|
[38] |
Bowman, C. J., Ayer, D. E. and Dynlacht, B. D. (2014) Foxk proteins repress the initiation of starvation-induced atrophy and autophagy programs. Nat. Cell Biol., 16, 1202–1214
CrossRef
Pubmed
Google scholar
|
[39] |
van Riel, B. and Rosenbauer, F. (2014) Epigenetic control of hematopoiesis: the PU.1 chromatin connection. Biol. Chem., 395, 1265–1274
CrossRef
Pubmed
Google scholar
|
[40] |
Steidl, U., Rosenbauer, F., Verhaak, R. G., Gu, X., Ebralidze, A., Otu, H. H., Klippel, S., Steidl, C., Bruns, I., Costa, D. B.,
CrossRef
Pubmed
Google scholar
|
[41] |
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. and Noble, W. S. (2007) Quantifying similarity between motifs. Genome Biol., 8, R24
CrossRef
Pubmed
Google scholar
|
[42] |
Hodis, E., Watson, I. R., Kryukov, G. V., Arold, S. T., Imielinski, M., Theurillat, J.-P., Nickerson, E., Auclair, D., Li, L., Place, C.,
CrossRef
Pubmed
Google scholar
|
[43] |
Xi, W. and Beer, M. A. (2018). Local epigenomic state cannot discriminate interacting and non-interacting enhancer–promoter pairs with high accuracy. PLoS Comput. Biol., 14, e1006625
|
[44] |
Cao, Q., Anyansi, C., Hu, X., Xu, L., Xiong, L., Tang, W., Mok, M.T.S., Cheng, C., Fan, X., Gerstein, M.
|
[45] |
Shrikumar, A., Greenside, P., Shcherbina, A. and Kundaje, A. (2016) Not just a black box: learning important features through propagating activation differences. arXiv, 1605.01713
|
[46] |
Li, Y., Chen, C.-Y. and Wasserman, W. W. (2016) Deep feature selection: theory and application to identify enhancers and promoters. J. Comput. Biol., 23, 322–336
CrossRef
Pubmed
Google scholar
|
[47] |
Glorot, X., Bordes, A. and Bengio, Y. (2011) Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligen Vol. 15, pp. 275
|
[48] |
LeCun, Y., Bengio, Y. and Hinton, G. (2015) Deep learning. Nature, 521, 436–444
CrossRef
Pubmed
Google scholar
|
[49] |
Hochreiter, S. and Schmidhuber, J. (1997) Long short-term memory. Neural Comput., 9, 1735–1780
CrossRef
Pubmed
Google scholar
|
[50] |
Graves, A., Jaitly, N. and Mohamed, A.-R. (2013) Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on IEEE pp. 273–278
|
[51] |
Chollet, F. (2015) Keras
|
[52] |
Kingma, D. and Ba, J. (2014) Adam: a method for stochastic optimization. arXiv, 1412.6980
|
[53] |
Ioffe, S. and Szegedy, C. (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd International Conference on Machine Learning pp. 448–456
|
[54] |
Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems pp. 1097–1105
|
[55] |
Grant, C. E., Bailey, T. L. and Noble, W. S. (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics, 27, 1017–1018
CrossRef
Pubmed
Google scholar
|
/
〈 | 〉 |