Prediction of chromatin looping using deep hybrid learning (DHL)

Mateusz Chiliński, Anup Kumar Halder, Dariusz Plewczynski

PDF(5191 KB)
PDF(5191 KB)
Quant. Biol. ›› 2023, Vol. 11 ›› Issue (2) : 155-162. DOI: 10.15302/J-QB-022-0315
RESEARCH ARTICLE
RESEARCH ARTICLE

Prediction of chromatin looping using deep hybrid learning (DHL)

Author information +
History +

Abstract

Background: With the development of rapid and cheap sequencing techniques, the cost of whole-genome sequencing (WGS) has dropped significantly. However, the complexity of the human genome is not limited to the pure sequence—and additional experiments are required to learn the human genome’s influence on complex traits. One of the most exciting aspects for scientists nowadays is the spatial organisation of the genome, which can be discovered using spatial experiments (e.g., Hi-C, ChIA-PET). The information about the spatial contacts helps in the analysis and brings new insights into our understanding of the disease developments.

Methods: We have used an ensemble of deep learning with classical machine learning algorithms. The deep learning network we used was DNABERT, which utilises the BERT language model (based on transformers) for the genomic function. The classical machine learning models included support vector machines (SVMs), random forests (RFs), and K-nearest neighbor (KNN). The whole approach was wrapped together as deep hybrid learning (DHL).

Results: We found that the DNABERT can be used to predict the ChIA-PET experiments with high precision. Additionally, the DHL approach has increased the metrics on CTCF and RNAPII sets.

Conclusions: DHL approach should be taken into consideration for the models utilising the power of deep learning. While straightforward in the concept, it can improve the results significantly.

Author summary

Deep neural networks have revolutionised every aspect of life. In our work, we are applying them along with other classical machine learning algorithms to create a robust artificial intelligence algorithm that can predict interactions in the nucleus. For that task, only pure DNA sequence is used, which can be obtained using well-known techniques, and is not only very cheap but also broadly available, both in scientific laboratories and with the help of commercially available kits.

Graphical abstract

Keywords

deep learning / 3D genomics / transformers / spatial organisation of nucleus / ChIA-PET / DNA-Seq

Cite this article

Download citation ▾
Mateusz Chiliński, Anup Kumar Halder, Dariusz Plewczynski. Prediction of chromatin looping using deep hybrid learning (DHL). Quant. Biol., 2023, 11(2): 155‒162 https://doi.org/10.15302/J-QB-022-0315

References

[1]
Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W. . (2001). Initial sequencing and analysis of the human genome. Nature, 409: 860–921
CrossRef Google scholar
[2]
Consortium, I. H. G. S. (2004). Finishing the euchromatic sequence of the human genome. Nature, 431: 931–945
CrossRef Google scholar
[3]
Abecasis, G. R., Altshuler, D., Auton, A., Brooks, L. D., Durbin, R. M., Gibbs, R. A., Hurles, M. E. McVean, G. A. (2010). A map of human genome variation from population-scale sequencing. Nature, 467: 1061–1073
CrossRef Google scholar
[4]
Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., Korbel, J. O., Marchini, J. L., McCarthy, S., McVean, G. A., Abecasis, G. R. . (2015). A global reference for human genetic variation. Nature, 526: 68–74
CrossRef Google scholar
[5]
Chaisson, M. J. P., Sanders, A. D., Zhao, X., Malhotra, A., Porubsky, D., Rausch, T., Gardner, E. J., Rodriguez, O. L., Guo, L., Collins, R. L. . (2019). Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun., 10: 1784
CrossRef Google scholar
[6]
Ozaki, K., Ohnishi, Y., Iida, A., Sekine, A., Yamada, R., Tsunoda, T., Sato, H., Sato, H., Hori, M., Nakamura, Y. . (2002). Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nat. Genet., 32: 650–654
CrossRef Google scholar
[7]
Pombo, A. (2015). Three-dimensional genome architecture: players and mechanisms. Nat. Rev. Mol. Cell Biol., 16: 245–257
CrossRef Google scholar
[8]
Dekker, J., Rippe, K., Dekker, M. (2002). Capturing chromosome conformation. Science, 295: 1306–1311
CrossRef Google scholar
[9]
Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B. (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet., 38: 1348–1354
CrossRef Google scholar
[10]
Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O. . (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326: 289–293
CrossRef Google scholar
[11]
Fullwood, M. J., Liu, M. H., Pan, Y. F., Liu, J., Xu, H., Mohamed, Y. B., Orlov, Y. L., Velkov, S., Ho, A., Mei, P. H. . (2009). An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462: 58–64
CrossRef Google scholar
[12]
Fudenberg, G., Kelley, D. R. Pollard, K. (2020). Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods, 17: 1111–1117
CrossRef Google scholar
[13]
TanJ.,Shenker-TaurisN.,Rodriguez-HernaezJ.,WangE.,SakellaropoulosT.,BoccalatteF.,ThandapaniP.,SkokJ.,AifantisI.,. (2022) Cell type-specific prediction of 3D chromatin architecture. Nat. Biotechnol.,
[14]
Devlin, J., Chang, M. Lee, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv, 181004805
[15]
Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A. (2019). A primer on deep learning in genomics. Nat. Genet., 51: 12–18
CrossRef Google scholar
[16]
SherstinskyA.. (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 404: 132306
[17]
Ji, Y., Zhou, Z., Liu, H. Davuluri, R. (2021). DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics, 37: 2112–2120
CrossRef Google scholar
[18]
Cortes, C. (1995). Support-vector networks. Mach. Learn., 20: 273–297
CrossRef Google scholar
[19]
Breiman, L. (2001). Random forests. Mach. Learn., 45: 5–32
CrossRef Google scholar
[20]
Fix, E. Hodges, J. (1989). Discriminatory analysis. Nonparametric discrimination: consistency properties. Int. Stat. Rev., 57: 238–247
CrossRef Google scholar
[21]
Rao, S. S. P., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., Robinson, J. T., Sanborn, A. L., Machol, I., Omer, A. D., Lander, E. S. . (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159: 1665–1680
CrossRef Google scholar
[22]
McArthur, E. Capra, J. (2021). Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability. Am. J. Hum. Genet., 108: 269–283
CrossRef Google scholar
[23]
Halder, A. Chatterjee, P., Nasipuri, M., Plewczynski, D. (2019). 3gClust: human protein cluster analysis. IEEE/ACM Trans. Comput. Biol. Bioinforma., 16: 1773–1784
CrossRef Google scholar

SUPPLEMENTARY MATERIALS

The description of all the detailed fold-wise experiment and individual fold-specific resuls and hold-out result tables are listed as Tables. S1−S12, and can be found online with this article at https://doi.org/10.15302/J-QB-022-0315.

ACKNOWLEDGEMENTS

This work has been supported by National Science Centre, Poland (Nos. 2019/35/O/ST6/02484 and 2020/37/B/NZ2/03757); Foundation for Polish Science, co-financed by the European Union under the European Regional Development Fund (TEAM to DP). The work has been co-supported by European Commission Horizon 2020 Marie Skłodowska-Curie ITN Enhpathy grant “Molecular Basis of Human enhanceropathies” and National Institute of Health USA 4DNucleome grant 1U54DK107967-01 “Nucleome Positioning System for Spatiotemporal Genome Organization and Regulation”. Research was co-funded by Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) programme. Computations were performed thanks to the Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology using the Artificial Intelligence HPC platform financed by Polish Ministry of Science and Higher Education (No. 7054/IA/SP/2020 of 2020-08-28).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Mateusz Chiliński, Anup Kumar Halder and Dariusz Plewczynski declare that they have no conflict of interest or financial conflicts to disclose.
All procedures performed in studies involving animals were in accordance with the ethical standards of the institution or practice at which the studies were conducted, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2023 The Author(s). Published by Higher Education Press.
AI Summary AI Mindmap
PDF(5191 KB)

Accesses

Citations

Detail

Sections
Recommended

/