Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation

Yuyang Wang, Yu Sun, Zeyu Liu, Bijia Chen, Hebing Chen, Chao Ren, Xuanwei Lin, Pengzhen Hu, Peiheng Jia, Xiang Xu, Kang Xu, Ximeng Liu, Hao Li, Xiaochen Bo

PDF(3157 KB)
PDF(3157 KB)
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (3) : 231-244. DOI: 10.1002/qub2.52
RESEARCH ARTICLE

Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation

Author information +
History +

Abstract

Copy number variation (CNV) refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation. The development of the Hi‐C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments. We utilized machine‐learning methods including the linear transformation model and graph convolutional network (GCN) to detect CNV events from Hi‐C data and reveal how CNV is related to three‐dimensional interactions between genomic fragments in terms of the one‐dimensional read count signal and features of the chromatin structure. The experimental results demonstrated a specific linear relation between the Hi‐C read count and CNV for each chromosome that can be well qualified by the linear transformation model. In addition, the GCN‐based model could accurately extract features of the spatial structure from Hi‐C data and infer the corresponding CNV across different chromosomes in a cancer cell line. We performed a series of experiments including dimension reduction, transfer learning, and Hi‐C data perturbation to comprehensively evaluate the utility and robustness of the GCN‐based model. This work can provide a benchmark for using machine learning to infer CNV from Hi‐C data and serves as a necessary foundation for deeper understanding of the relationship between Hi‐C data and CNV.

Keywords

copy number variant / deep learning / graph convolution network / Hi‐C

Cite this article

Download citation ▾
Yuyang Wang, Yu Sun, Zeyu Liu, Bijia Chen, Hebing Chen, Chao Ren, Xuanwei Lin, Pengzhen Hu, Peiheng Jia, Xiang Xu, Kang Xu, Ximeng Liu, Hao Li, Xiaochen Bo. Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation. Quant. Biol., 2024, 12(3): 231‒244 https://doi.org/10.1002/qub2.52

References

[1]
Gökçümen Ö , Lee C . Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods. 2009; 49 (1): 18- 25.
[2]
Seaman L , Chen H , Brown M , Wangsa D , Patterson G , Camps J , et al. Nucleome analysis reveals structure-function relationships for colon cancer. Mol Cancer Res. 2017; 15 (7): 821- 30.
[3]
Ay F , Vu TH , Zeitz MJ , Varoquaux N , Carette JE , Vert J-P , et al. Identifying multi-locus chromatin contacts in human cells using tethered multiple 3C. BMC Genom. 2015; 16: 1- 17.
[4]
Lupiáñez DG , Spielmann M , Mundlos S . Breaking TADs: how alterations of chromatin domains result in disease. Trends Genet. 2016; 32 (4): 225- 37.
[5]
Dixon JR , Selvaraj S , Yue F , Kim A , Li Y , Shen Y , et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012; 485 (7398): 376- 80.
[6]
Wang J , Tao H , Li H , Bo X , Chen H . 3D genomic organization in cancers. Quant Biol. 2023; 11 (2): 109- 21.
[7]
Wu P , Li T , Li R , Jia L , Zhu P , Liu Y , et al. 3D genome of multiple myeloma reveals spatial genome disorganization associated with copy number variations. Nat Commun. 2017; 8 (1): 1937.
[8]
Guan P , Sung W-K . Structural variation detection using next-generation sequencing data: a comparative technical review. Methods. 2016; 102: 36- 49.
[9]
Wang S , Lee S , Chu C , Jain D , Kerpedjiev P , Nelson GM , et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol. 2020; 21: 1- 15.
[10]
Chakraborty A , Ay F . Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics. 2018; 34 (2): 338- 45.
[11]
Gordeeva V , Sharova E , Arapidi G . Progress in methods for copy number variation profiling. Int J Mol Sci. 2022; 23 (4): 2143.
[12]
Lieberman-Aiden E , Van Berkum NL , Williams L , Imakaev M , Ragoczy T , Telling A , et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326 (5950): 289- 93.
[13]
Stevens TJ , Lando D , Basu S , Atkinson LP , Cao Y , Lee SF , et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature. 2017; 544 (7648): 59- 64.
[14]
Wu H-J , Michor F . A computational strategy to adjust for copy numberintumorHi-Cdata. Bioinformatics. 2016; 32 (24): 3695- 701.
[15]
Jumper J , Evans R , Pritzel A , Green T , Figurnov M , Ronneberger O , et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596 (7873): 583- 9.
[16]
Wang S , Sun S , Li Z , Zhang R , Xu JJ . Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol. 2017; 13 (1): e1005324.
[17]
Li W , Wong WH , Jiang RJ . DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning. Nucleic Acids Res. 2019; 47 (10): e60.
[18]
Schwessinger R , Gosden M , Downes D , Brown RC , Oudelaar AM , Telenius J , et al. DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat Methods. 2020; 17 (11): 1118- 24.
[19]
Fudenberg G , Kelley DR , Pollard KS . Predicting 3D genome folding from DNA sequence with Akita. Nat Methods. 2020; 17 (11): 1111- 7.
[20]
Zhang Y , An L , Xu J , Zhang B , Zheng WJ , Hu M , et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun. 2018; 9 (1): 750.
[21]
Wang X , Luan Y , Yue FJ . EagleC: a deep-learning framework for detecting a full range of structural variations from bulk and single-cell contact maps. Sci Adv. 2022; 8 (24): eabn9215.
[22]
Kipf TN , Welling M . Semi‐supervised classification with graph convolutional networks; 2016. Preprint at arXiv:1609.02907.
[23]
Li H , Sun Y , Hong H , Huang X , Tao H , Huang Q , et al. Inferring transcription factor regulatory networks from single-cell ATAC-seq data based on graph neural networks. Nat Mach Intell. 2022; 4: 389- 400.
[24]
Imakaev M , Fudenberg G , McCord RP , Naumova N , Goloborodko A , Lajoie BR , et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Methods. 2012; 9 (10): 999- 1003.
[25]
Grover A , Leskovec J . Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 855- 64.
[26]
He K , Zhang X , Ren S , Sun J . Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR); 2015. p. 770- 8.

RIGHTS & PERMISSIONS

2024 2024 The Author(s). Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.
AI Summary AI Mindmap
PDF(3157 KB)

Accesses

Citations

Detail

Sections
Recommended

/