Prediction of chromatin looping using deep hybrid learning (DHL)
Mateusz Chiliński, Anup Kumar Halder, Dariusz Plewczynski
Prediction of chromatin looping using deep hybrid learning (DHL)
Background: With the development of rapid and cheap sequencing techniques, the cost of whole-genome sequencing (WGS) has dropped significantly. However, the complexity of the human genome is not limited to the pure sequence—and additional experiments are required to learn the human genome’s influence on complex traits. One of the most exciting aspects for scientists nowadays is the spatial organisation of the genome, which can be discovered using spatial experiments (e.g., Hi-C, ChIA-PET). The information about the spatial contacts helps in the analysis and brings new insights into our understanding of the disease developments.
Methods: We have used an ensemble of deep learning with classical machine learning algorithms. The deep learning network we used was DNABERT, which utilises the BERT language model (based on transformers) for the genomic function. The classical machine learning models included support vector machines (SVMs), random forests (RFs), and K-nearest neighbor (KNN). The whole approach was wrapped together as deep hybrid learning (DHL).
Results: We found that the DNABERT can be used to predict the ChIA-PET experiments with high precision. Additionally, the DHL approach has increased the metrics on CTCF and RNAPII sets.
Conclusions: DHL approach should be taken into consideration for the models utilising the power of deep learning. While straightforward in the concept, it can improve the results significantly.
Deep neural networks have revolutionised every aspect of life. In our work, we are applying them along with other classical machine learning algorithms to create a robust artificial intelligence algorithm that can predict interactions in the nucleus. For that task, only pure DNA sequence is used, which can be obtained using well-known techniques, and is not only very cheap but also broadly available, both in scientific laboratories and with the help of commercially available kits.
deep learning / 3D genomics / transformers / spatial organisation of nucleus / ChIA-PET / DNA-Seq
[1] |
Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W.
CrossRef
Google scholar
|
[2] |
Consortium, I. H. G. S. (2004). Finishing the euchromatic sequence of the human genome. Nature, 431: 931–945
CrossRef
Google scholar
|
[3] |
Abecasis, G. R., Altshuler, D., Auton, A., Brooks, L. D., Durbin, R. M., Gibbs, R. A., Hurles, M. E. McVean, G. A. (2010). A map of human genome variation from population-scale sequencing. Nature, 467: 1061–1073
CrossRef
Google scholar
|
[4] |
Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., Korbel, J. O., Marchini, J. L., McCarthy, S., McVean, G. A., Abecasis, G. R.
CrossRef
Google scholar
|
[5] |
Chaisson, M. J. P., Sanders, A. D., Zhao, X., Malhotra, A., Porubsky, D., Rausch, T., Gardner, E. J., Rodriguez, O. L., Guo, L., Collins, R. L.
CrossRef
Google scholar
|
[6] |
Ozaki, K., Ohnishi, Y., Iida, A., Sekine, A., Yamada, R., Tsunoda, T., Sato, H., Sato, H., Hori, M., Nakamura, Y.
CrossRef
Google scholar
|
[7] |
Pombo, A. (2015). Three-dimensional genome architecture: players and mechanisms. Nat. Rev. Mol. Cell Biol., 16: 245–257
CrossRef
Google scholar
|
[8] |
Dekker, J., Rippe, K., Dekker, M. (2002). Capturing chromosome conformation. Science, 295: 1306–1311
CrossRef
Google scholar
|
[9] |
Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B. (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet., 38: 1348–1354
CrossRef
Google scholar
|
[10] |
Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O.
CrossRef
Google scholar
|
[11] |
Fullwood, M. J., Liu, M. H., Pan, Y. F., Liu, J., Xu, H., Mohamed, Y. B., Orlov, Y. L., Velkov, S., Ho, A., Mei, P. H.
CrossRef
Google scholar
|
[12] |
Fudenberg, G., Kelley, D. R. Pollard, K. (2020). Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods, 17: 1111–1117
CrossRef
Google scholar
|
[13] |
TanJ.,Shenker-TaurisN.,Rodriguez-HernaezJ.,WangE.,SakellaropoulosT.,BoccalatteF.,ThandapaniP.,SkokJ.,AifantisI.,. (2022) Cell type-specific prediction of 3D chromatin architecture. Nat. Biotechnol.,
|
[14] |
Devlin, J., Chang, M. Lee, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv,
|
[15] |
Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A. (2019). A primer on deep learning in genomics. Nat. Genet., 51: 12–18
CrossRef
Google scholar
|
[16] |
SherstinskyA.. (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 404: 132306
|
[17] |
Ji, Y., Zhou, Z., Liu, H. Davuluri, R. (2021). DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics, 37: 2112–2120
CrossRef
Google scholar
|
[18] |
Cortes, C. (1995). Support-vector networks. Mach. Learn., 20: 273–297
CrossRef
Google scholar
|
[19] |
Breiman, L. (2001). Random forests. Mach. Learn., 45: 5–32
CrossRef
Google scholar
|
[20] |
Fix, E. Hodges, J. (1989). Discriminatory analysis. Nonparametric discrimination: consistency properties. Int. Stat. Rev., 57: 238–247
CrossRef
Google scholar
|
[21] |
Rao, S. S. P., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., Robinson, J. T., Sanborn, A. L., Machol, I., Omer, A. D., Lander, E. S.
CrossRef
Google scholar
|
[22] |
McArthur, E. Capra, J. (2021). Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability. Am. J. Hum. Genet., 108: 269–283
CrossRef
Google scholar
|
[23] |
Halder, A. Chatterjee, P., Nasipuri, M., Plewczynski, D. (2019). 3gClust: human protein cluster analysis. IEEE/ACM Trans. Comput. Biol. Bioinforma., 16: 1773–1784
CrossRef
Google scholar
|
/
〈 | 〉 |