Computational tools for Hi-C data analysis
Zhijun Han, Gang Wei
Computational tools for Hi-C data analysis
Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long-range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process.
Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results.
Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.
3D genome structure / Hi-C data processing tool / chromatin interactions
[1] |
Gorkin, D. U., Leung, D. and Ren, B. (2014) The 3D genome in transcriptional regulation and pluripotency. Cell Stem Cell, 14, 762–775
CrossRef
Pubmed
Google scholar
|
[2] |
Phillips-Cremins, J. E., Sauria, M. E., Sanyal, A., Gerasimova, T. I., Lajoie, B. R., Bell, J. S., Ong, C. T., Hookway, T. A., Guo, C., Sun, Y.,
CrossRef
Pubmed
Google scholar
|
[3] |
Dekker, J., Rippe, K., Dekker, M. and Kleckner, N. (2002) Capturing chromosome conformation. Science, 295, 1306–1311
CrossRef
Pubmed
Google scholar
|
[4] |
Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B. and de Laat, W. (2006) Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet., 38, 1348–1354
CrossRef
Pubmed
Google scholar
|
[5] |
Dostie, J., Richmond, T. A., Arnaout, R. A., Selzer, R. R., Lee, W. L., Honan, T. A., Rubio, E. D., Krumm, A., Lamb, J., Nusbaum, C.,
CrossRef
Pubmed
Google scholar
|
[6] |
Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O.,
CrossRef
Pubmed
Google scholar
|
[7] |
Fullwood, M. J., Liu, M. H., Pan, Y. F., Liu, J., Xu, H., Mohamed, Y. B., Orlov, Y. L., Velkov, S., Ho, A., Mei, P. H.,
CrossRef
Pubmed
Google scholar
|
[8] |
Jäger, R., Migliorini, G., Henrion, M., Kandaswamy, R., Speedy, H. E., Heindl, A., Whiffin, N., Carnicer, M. J., Broome, L., Dryden, N.,
CrossRef
Pubmed
Google scholar
|
[9] |
Dixon, J. R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J. S. and Ren, B. (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485, 376–380
CrossRef
Pubmed
Google scholar
|
[10] |
Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L.,
CrossRef
Pubmed
Google scholar
|
[11] |
Castellano, G., Le Dily, F., Hermoso Pulido, A., Beato, M. and Roma, G. (2015) Hi-Cpipe: a pipeline for high-throughput chromosome capture. bioRxiv, doi: https://doi.org/10.1101/020636
|
[12] |
HiC-Box. available from
|
[13] |
Schmid, M. W., Grob, S. and Grossniklaus, U. (2015) HiCdat: a fast and easy-to-use Hi-C data analysis tool. BMC Bioinformatics, 16, 277
CrossRef
Pubmed
Google scholar
|
[14] |
Hwang, Y. C., Lin, C. F., Valladares, O., Malamon, J., Kuksa, P. P., Zheng, Q., Gregory, B. D. and Wang, L. S. (2015) HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics, 31, 1290–1292
CrossRef
Pubmed
Google scholar
|
[15] |
Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S., Huntley, M. H., Lander, E. S. and Aiden, E. L. (2016) Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst., 3, 95–98
CrossRef
Pubmed
Google scholar
|
[16] |
Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J. and Mirny, L. A. (2012) Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods, 9, 999–1003
CrossRef
Pubmed
Google scholar
|
[17] |
Wingett, S., Ewels, P., Furlan-Magaril, M., Nagano, T., Schoenfelder, S., Fraser, P. and Andrews, S. (2015) HiCUP: pipeline for mapping and processing Hi-C data. F1000Res, 4, 1310
Pubmed
|
[18] |
Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C. J., Vert, J. P., Heard, E., Dekker, J. and Barillot, E. (2015) HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol., 16, 259
CrossRef
Pubmed
Google scholar
|
[19] |
Serra, F., Baù, D., Filion, G. and Marti-Renom, M. A. (2016) Structural features of the fly chromatin colors revealed by automatic three-dimensional modeling. bioRxiv, doi: https://doi.org/10.1101/036764
|
[20] |
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and the 1000 Genome Project Data Processing Subgroup. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079
CrossRef
Pubmed
Google scholar
|
[21] |
Ma, W., Ay, F., Lee, C., Gulsoy, G., Deng, X., Cook, S., Hesson, J., Cavanaugh, C., Ware, C. B., Krumm, A.,
CrossRef
Pubmed
Google scholar
|
[22] |
Hu, M., Deng, K., Selvaraj, S., Qin, Z., Ren, B. and Liu, J. S. (2012) HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics, 28, 3131–3133
CrossRef
Pubmed
Google scholar
|
[23] |
Knight, P. A. and Ruiz, D. (2013) A fast algorithm for matrix balancing. IMA J. Numer. Anal., 33, 1029–1047
CrossRef
Google scholar
|
[24] |
Yaffe, E. and Tanay, A. (2011) Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet., 43, 1059–1065
CrossRef
Pubmed
Google scholar
|
[25] |
Rao, S. S., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., Robinson, J. T., Sanborn, A. L., Machol, I., Omer, A. D., Lander, E. S.,
CrossRef
Pubmed
Google scholar
|
[26] |
Sexton, T., Yaffe, E., Kenigsberg, E., Bantignies, F., Leblanc, B., Hoichman, M., Parrinello, H., Tanay, A. and Cavalli, G. (2012) Three-dimensional folding and functional organization principles of the Drosophila genome. Cell, 148, 458–472
CrossRef
Pubmed
Google scholar
|
[27] |
Filippova, D., Patro, R., Duggal, G. and Kingsford, C. (2014) Identification of alternative topological domains in chromatin. Algorithms Mol. Biol., 9, 14
CrossRef
Pubmed
Google scholar
|
[28] |
Lévy-Leduc, C., Delattre, M., Mary-Huard, T. and Robin, S. (2014) Two-dimensional segmentation for analyzing Hi-C data. Bioinformatics, 30, i386–i392
CrossRef
Pubmed
Google scholar
|
[29] |
Wang, Y., Li, Y., Gao, J. and Zhang, M. Q. (2015) A novel method to identify topological domains using Hi-C data. Quant. Biol., 3, 81–89
CrossRef
Google scholar
|
[30] |
Zhou, X., Lowdon, R. F., Li, D., Lawson, H. A., Madden, P. A., Costello, J. F. and Wang, T. (2013) Exploring long-range genome interactions using the WashU Epigenome Browser. Nat. Methods, 10, 375–376
CrossRef
Pubmed
Google scholar
|
[31] |
The 3D Genome Browser.
|
[32] |
Karolchik, D., Barber, G. P., Casper, J., Clawson, H., Cline, M. S., Diekhans, M., Dreszer, T. R., Fujita, P. A., Guruvadoo, L., Haeussler, M.,
CrossRef
Pubmed
Google scholar
|
[33] |
Asbury, T. M., Mitman, M., Tang, J. and Zheng, W. J. (2010) Genome3D: a viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome. BMC Bioinformatics, 11, 444
CrossRef
Pubmed
Google scholar
|
[34] |
Lewis, T. E., Sillitoe, I., Andreeva, A., Blundell, T. L., Buchan, D. W., Chothia, C., Cozzetto, D., Dana, J. M., Filippis, I., Gough, J.,
CrossRef
Pubmed
Google scholar
|
[35] |
Lewis, T. E., Sillitoe, I., Andreeva, A., Blundell, T. L., Buchan, D. W., Chothia, C., Cuff, A., Dana, J. M., Filippis, I., Gough, J.,
CrossRef
Pubmed
Google scholar
|
[36] |
TADkit. available from
|
[37] |
Ay, F. and Noble, W. S. (2015) Analysis methods for studying the 3D architecture of the genome. Genome Biol., 16, 183
CrossRef
Pubmed
Google scholar
|
[38] |
Schmitt, A. D., Hu, M. and Ren, B. (2016) Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol., 17, 743–755
CrossRef
Pubmed
Google scholar
|
[39] |
Ashish, N., Dewan, P., Ambite, J. L. and Toga, A. W. (2015) GEM: the GAAIN entity mapper. Data Integr. Life Sci., 9162, 13–27
CrossRef
Pubmed
Google scholar
|
[40] |
Marco-Sola, S., Sammeth, M., Guigó, R. and Ribeca, P. (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods, 9, 1185–1188
CrossRef
Pubmed
Google scholar
|
[41] |
Durand, N. C., Robinson, J. T., Shamim, M. S., Machol, I., Mesirov, J. P., Lander, E. S. and Aiden, E. L. (2016) Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst., 3, 99–101
CrossRef
Pubmed
Google scholar
|
[42] |
Li, W., Gong, K., Li, Q., Alber, F. and Zhou, X. J. (2015) Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data. Bioinformatics, 31, 960–962
CrossRef
Pubmed
Google scholar
|
[43] |
Sauria, M. E., Phillips-Cremins, J. E., Corces, V. G. and Taylor, J. (2015) HiFive: a tool suite for easy and efficient HiC and 5C data analysis. Genome Biol., 16, 237
CrossRef
Pubmed
Google scholar
|
[44] |
Lun, A. T. and Smyth, G. K. (2015) diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics, 16, 258
CrossRef
Pubmed
Google scholar
|
/
〈 | 〉 |