A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data

Xin Chen , Li Tai Fang , Zhong Chen , Wanqiu Chen , Hongjin Wu , Bin Zhu , Malcolm Moos. Jr , Andrew Farmer , Xiaowen Zhang , Wei Xiong , Shusheng Gong , Wendell Jones , Christopher E. Mason , Shixiu Wu , Chunlin Xiao , Charles Wang

Precision Clinical Medicine ›› 2025, Vol. 8 ›› Issue (2) : pbaf011

PDF (2319KB)
Precision Clinical Medicine ›› 2025, Vol. 8 ›› Issue (2) :pbaf011 DOI: 10.1093/pcmedi/pbaf011
Research Article
research-article

A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data

Author information +
History +
PDF (2319KB)

Abstract

Background: Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful tool for cancer research, enabling in-depth characterization of tumor heterogeneity at the single-cell level. Recently, several scRNA-seq copy number variation (scCNV) inference methods have been developed, expanding the application of scRNA-seq to study genetic heterogeneity in cancer using transcriptomic data. However, the fidelity of these methods has not been investigated systematically.

Methods: We benchmarked five commonly used scCNV inference methods HoneyBADGER, CopyKAT, CaSpER, inferCNV, and sciCNV. We evaluated their performance across four different scRNA-seq platforms using data from our previous multicenter study. We evaluated scCNV performance further using scRNA-seq datasets derived from mixed samples consisting of five human lung adenocarcinoma cell lines and also sequenced tissues from a small cell lung cancer patient and used the data to validate our findings with a clinical scRNA-seq dataset.

Results: We found that the sensitivity and specificity of the five scCNV inference methods varied, depending on the selection of reference data, sequencing depth, and read length. CopyKAT and CaSpER outperformed other methods overall, while inferCNV, sciCNV, and CopyKAT performed better than other methods in subclone identification. We found that batch effects significantly affected the performance of subclone identification in mixed datasets in most methods we tested.

Conclusion: Our benchmarking study revealed the strengths and weaknesses of each of these scCNV inference methods and provided guidance for selecting the optimal CNV inference method using scRNA-seq data.

Keywords

scRNA-seq / RNA-seq / copy number variation (CNV) inference / scRNA-seq CNV methods / benchmarking

Cite this article

Download citation ▾
Xin Chen, Li Tai Fang, Zhong Chen, Wanqiu Chen, Hongjin Wu, Bin Zhu, Malcolm Moos. Jr, Andrew Farmer, Xiaowen Zhang, Wei Xiong, Shusheng Gong, Wendell Jones, Christopher E. Mason, Shixiu Wu, Chunlin Xiao, Charles Wang. A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data. Precision Clinical Medicine, 2025, 8(2): pbaf011 DOI:10.1093/pcmedi/pbaf011

登录浏览全文

4963

注册一个新账户 忘记密码

Acknowledgements

The authors would like to thank Ms. Diana Ho and Ms. Adriana Lopez of the LLU Center for Genomics for their administrative support, particularly in coordinating the Zoom conference calls for the project. The authors would like to thank Dr. Feng Zeng for his help with editing Figure 1 and Dr. Lijuan Song for her assistance with reference citation proof-reading. The genomic work carried out at the LLU Center for Genomics was funded in part by the Ardmore Institute of Health grant 2150141 (CW) and Dr. Charles A. Sims' gift to LLU Center for Genomics. The work of Chunlin Xiao was supported by the National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health.

Author contributions

Xin Chen (Data curation, Formal analysis, Investigation, Visualization, Writing - original draft, Writing - review & editing), Li Tai Fang (Investigation, Validation, Writing - review & editing), Zhong Chen (Writing - review & editing), Wanqiu Chen (Writing - review & editing), Hongjin Wu (Writing - review & editing), Bin Zhu (Writing - review & editing), Malcolm Moos Jr (Writing - review & editing), Andrew Farmer (Writing - review & editing), Xiaowen Zhang (Writing - review & editing), Wei Xiong (Writing - review & editing), Shusheng Gong (Writing - review & editing), Wendell Jones (Writing - review & editing), Christopher E. Mason (Writing - review & editing), Shixiu Wu (Writing - review & editing), Chunlin Xiao (Writing - review & editing), Charles Wang (Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing - original draft, Writing - review & editing)

Supplementary data

Supplementary data is available at PCMEDI online.

Conflicts of interest

Andrew Farmer is an employee of Takara Bio USA, Inc., and Wendell Jones is an employee of IQVIA Laboratories Genomics. All other authors claim there are no conflicts of interest. The views presented in this article do not necessarily reflect the current or future opinion or policy of the US Food and Drug Administration. Any mention of commercial products is for clarification and not intended as an endorsement. In addition, as the Editorial Board Member of Precision Clinical Medicine, the author Christopher E Mason and the corresponding author Charles Wang were blinded from reviewing and making decisions on this manuscript.

References

[1]

Rosenthal R, McGranahan N, Herrero J et al. Deciphering genetic intratumor heterogeneity and its impact on cancer evolution. Annu Rev Canc Biol 2017;1:223-40. https://doi.org/10.1146/annurev-cancerbio-042516-011348

[2]

Marusyk A, Almendro V, Polyak K. Intra-tumour heterogeneity: a looking glass for cancer? Nat Rev Cancer 2012;12:323-34. https://doi.org/10.1038/nrc3261

[3]

Andor N, Graham TA, Jansen M et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat Med 2016;22:105-13. https://doi.org/10.1038/nm.3984

[4]

Polyak K. Heterogeneity in breast cancer. J Clin Invest 2011;121:3786-8. https://doi.org/10.1172/JCI60534

[5]

Landau DA, Carter SL, Stojanov P et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 2013;152:714-26. https://doi.org/10.1016/j.cell.2013.01.019

[6]

Burrell RA, McGranahan N, Bartek J et al. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 2013;501:338-45. https://doi.org/10.1038/nature12625

[7]

Zack TI, Schumacher SE, Carter SL et al. Pan-cancer patterns of somatic copy number alteration. Nat Genet 2013;45:1134-40. https://doi.org/10.1038/ng.2760

[8]

Meijers-Heijboer H, van den Ouweland A, Klijn J et al. Low-penetrance susceptibility to breast cancer due to CHEK2*1100delC in noncarriers of BRCA 1 or BRCA2 mutations. Nat Genet 2002;31:55-9. https://doi.org/10.1038/ng879

[9]

Beroukhim R, Mermel CH, Porter D et al. The landscape of somatic copy-number alteration across human cancers. Nature 2010;463:899-905. https://doi.org/10.1038/nature08822

[10]

Bignell GR, Greenman C, Davies H et al. Signatures of mutation and selection in the cancer genome. Nature 2010;463:893-9. https://doi.org/10.1038/nature08768

[11]

Stranger BE, Forrest MS, Dunning M et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007;315:848-53. https://doi.org/10.1126/science.1136678

[12]

Fehrmann RSN, Karjalainen J, Krajewska M et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet 2015;47:115-25. https://doi.org/10.1038/ng.3173

[13]

Martinez-Climent JA, Alizadeh AA, Segraves R et al. Transformation of follicular lymphoma to diffuse large cell lymphoma is associated with a heterogeneous set of DNA copy number and gene expression alterations. Blood 2003;101:3109-17. https://doi.org/10.1182/blood-2002-07-2119

[14]

Zheng GXY, Terry J, Belgrader P et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 2017;8:14049. https://doi.org/10.1038/ncomms14049

[15]

Gao RL, Kim C, Sei E et al. Nanogrid single-nucleus RNA sequencing reveals phenotypic diversity in breast cancer. Nat Commun 2017;8:228. https://doi.org/10.1038/s41467-017-00244-w

[16]

Macosko EZ, Basu A, Satija R et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 2015;161:1202-14. https://doi.org/10.1016/j.cell.2015.05.002

[17]

Hashimshony T, Senderovich N, Avital G et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-seq. Genome Biol 2016;17:77. https://doi.org/10.1186/s13059-016-0938-8

[18]

Picelli S, Faridani O, Björklund Å et al. Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 2014;9:171-81. https://doi.org/10.1038/nprot.2014.006

[19]

Macaulay IC, Haerty W, Kumar P et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nature Methods 2015;12:519-22. https://doi.org/10.1038/nmeth.3370

[20]

Dey SS, Kester L, Spanjaard B et al. Integrated genome and transcriptome sequencing of the same cell. Nat Biotechnol 2015;33:285-9. https://doi.org/10.1038/nbt.3129

[21]

Fan J, Lee HO, Lee S et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNAseq data. Genome Res 2018;28:1217-27. https://doi.org/10.1101/gr.228080.117

[22]

Patel AP, Tirosh I, Trombetta JJ et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 2014;344:1396-401. https://doi.org/10.1126/science.125425 7

[23]

Mahdipour-Shirayeh A, Erdmann N, Leung-Hagesteijn C et al. sciCNV: high-throughput paired profiling of transcriptomes and DNA copy number variations at single-cell resolution. Brief Bioinform 2022;23:bbab413. https://doi.org/10.1093/bib/bbab413

[24]

Harmanci AS, Harmanci AO, Zhou XB. CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data. Nat Commun 2020;11:89. https://doi.org/10.1038/s41467-019-13779-x

[25]

Gao RL, Bai S, Henderson YC et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol 2021;39:599-608. https://doi.org/10.1038/s41587-020-00795-2

[26]

Hubert L, Arabie P. Comparing partitions. J Classif 1985;2:93-218. https://doi.org/10.1007/BF01908075

[27]

Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings. J Am Stat Assoc 1983;78:553-69. https://doi.org/10.2307/2288117

[28]

Viola P, Wells WM. Alignment by maximization of mutual information. In Proceedings of IEEE International Conference on Computer Vision, Cambridge, MA, USA, 1995, pp. 16-23. https://doi.org/10.1109/ICCV.1995.466930

[29]

Koehn P, Hoang H. Factored translation models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), Prague, Czech Republic, pp. 868-876.

[30]

Chen W, Zhao Y, Chen X et al. A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. Nat Biotechnol 2021;39:1103-14. https://doi.org/10.1038/s41587-020-00748-9

[31]

Tian LY, Dong X, Freytag S et al. Benchmarking single cell RNAsequencing analysis pipelines using mixture control experiments. Nat Methods 2019;16:479-87. https://doi.org/10.1038/s41592-019-0425-8

[32]

Talsania K, Shen TW, Chen X et al. Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies. Genome Biol 2022;23:255. https://doi.org/10.1186/s13059-022-02816-6

[33]

Xiao C, Chen Z, Chen W et al. Personalized genome assembly for accurate cancer somatic mutation discovery using tumornormal paired reference samples. Genome Biol 2022;23:237. https://doi.org/10.1186/s13059-022-02803-x

[34]

Fang LT, Zhu B, Zhao Y et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol 2021;39:1151-60. https://doi.org/10.1038/s41587-021-00993-6

[35]

Xiao W, Ren L, Chen Z et al. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol 2021;39:1141-50. https://doi.org/10.1038/s41587-021-00994-5

[36]

Wu H, Zhang XY, Hu Z et al. Evolution and heterogeneity of nonhereditary colorectal cancer revealed by single-cell exome sequencing. Oncogene 2017;36:2857-67. https://doi.org/10.1038/onc.2016.438

[37]

Wu H, Yu J, Li Y et al. Single-cell RNA sequencing reveals diverse intratumoral heterogeneities and gene signatures of two types of esophageal cancers. Cancer Lett 2018;438:133-43. https://doi.org/10.1016/j.canlet.2018.09.017

[38]

George J, Lim J, Jang S et al. Comprehensive genomic profiles of small cell lung cancer. Nature 2015;524:47-53. https://doi.org/10.1038/nature14664

[39]

Holik AZ, Law CW, Liu R et al. RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods. Nucleic Acids Res 2017;45:e30. https://doi.org/10.1093/nar/gkw1063

[40]

Choo-Wosoba H, Albert PS, Zhu B. A hidden Markov modeling approach for identifying tumor subclones in next-generation sequencing studies. Biostatistics 2022;23:69-82. https://doi.org/10.1093/biostatistics/kxaa013

[41]

Lonsdale J, Thomas J, Salvatore M et al. The genotype-tissue expression (GTEx) project. Nat Genet 2013;45:580-5. https://doi.org/10.1038/ng. 2653

[42]

Yang SY, Corbett SE, Koga Y et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 2020;21:57. https://doi.org/10.1186/s13059-020-1950-6

[43]

Mereu E, Lafzi A, Moutinho C et al. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat Biotechnol 2020;38:747-55. https://doi.org/10.1038/s41587-020-0469-4

[44]

Ritchie ME, Phipson B, Wu D et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. https://doi.org/10.1093/nar/gkv007

[45]

Leek JT, Johnson WE, Parker HS et al. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 2012;28:882-3. https://doi.org/10.1093/bioinformatics/bts034

AI Summary AI Mindmap
PDF (2319KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/