Accurate cell type annotation for single-cell chromatin accessibility data via contrastive learning and reference guidance

Siyu Li , Songming Tang , Yunchang Wang , Sijie Li , Yuhang Jia , Shengquan Chen

Quant. Biol. ›› 2024, Vol. 12 ›› Issue (1) : 85 -99.

PDF (2015KB)
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (1) :85 -99. DOI: 10.1002/qub2.33
RESEARCH ARTICLE

Accurate cell type annotation for single-cell chromatin accessibility data via contrastive learning and reference guidance

Author information +
History +
PDF (2015KB)

Abstract

Recent advances in single-cell chromatin accessibility sequencing (scCAS) technologies have resulted in new insights into the characterization of epigenomic heterogeneity and have increased the need for automatic cell type annotation. However, existing automatic annotation methods for scCAS data fail to incorporate the reference data and neglect novel cell types, which only exist in a test set. Here, we propose RAINBOW, a reference-guided automatic annotation method based on the contrastive learning framework, which is capable of effectively identifying novel cell types in a test set. By utilizing contrastive learning and incorporating reference data, RAINBOW can effectively characterize the heterogeneity of cell types, thereby facilitating more accurate annotation. With extensive experiments on multiple scCAS datasets, we show the advantages of RAINBOW over state-of-the-art methods in known and novel cell type annotation. We also verify the effectiveness of incorporating reference data during the training process. In addition, we demonstrate the robustness of RAINBOW to data sparsity and number of cell types. Furthermore, RAINBOW provides superior performance in newly sequenced data and can reveal biological implication in downstream analyses. All the results demonstrate the superior performance of RAINBOW in cell type annotation for scCAS data. We anticipate that RAINBOW will offer essential guidance and great assistance in scCAS data analysis. The source codes are available at the GitHub website (BioX-NKU/RAINBOW).

Keywords

cell type annotation / chromatin accessibility / novel type / reference-guided / single-cell

Cite this article

Download citation ▾
Siyu Li, Songming Tang, Yunchang Wang, Sijie Li, Yuhang Jia, Shengquan Chen. Accurate cell type annotation for single-cell chromatin accessibility data via contrastive learning and reference guidance. Quant. Biol., 2024, 12(1): 85-99 DOI:10.1002/qub2.33

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Buenrostro JD, Corces MR, Lareau CA, Wu B, Schep AN, Aryee MJ, et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell. 2018;173(6):1535–48.e16.

[2]

Klemm SL, Shipony Z, Greenleaf WJ. Chromatin accessibility and the regulatory epigenome. Nat Rev Genet. 2019;20(4): 207–20.

[3]

Chen H, Lareau C, Andreani T, Vinyard ME, Garcia SP, Clement K, et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol. 2019;20(1):241.

[4]

Chen X, Chen S, Song S, Gao Z, Hou L, Zhang X, et al. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nat Mach Intell. 2022;4(2): 116–26.

[5]

Gao Z, Chen X, Li Z, Cui X, Jiang Q, Li K, et al. scEpiTools: a database to comprehensively interrogate analytic tools for single-cell epigenomic data. J Genet Genomics. 2023.

[6]

Domcke S, Hill AJ, Daza RM, Cao J, O’Day DR, Pliner HA, et al. A human cell atlas of fetal chromatin accessibility. Science. 2020;370(6518):370.

[7]

Hodge RD, Bakken TE, Miller JA, Smith KA, Barkan ER, Graybuck LT, et al. Conserved cell types with divergent features in human versus mouse cortex. Nature. 2019;573(7772):61–8.

[8]

Cao J, O’Day DR, Pliner HA, Kingsley PD, Deng M, Daza RM, et al. A human cell atlas of fetal gene expression. Science. 2020;370(6518):eaba7721.

[9]

Wang S, Pisco AO, McGeever A, Brbic M, Zitnik M, Darmanis S, et al. Leveraging the Cell Ontology to classify unseen cell types. Nat Commun. 2021;12(1):5556.

[10]

Schaum N, Karkanias J, Neff NF, May AP, Quake SR, Wyss-Coray T, et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562(7727):367–72.

[11]

Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496–502.

[12]

Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf BJIIS, Support vector machines. IEEE Intell Syst Their Appl. 1998;13(4):18–28.

[13]

Breiman LJM. Random forests. Mach Learn. 2001;45(1):5–32.

[14]

Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theor. 1967;13(1):21–7.

[15]

Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74.

[16]

Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018;46(D1):D794–801.

[17]

Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523(7561): 486–90.

[18]

Cusanovich DA, Daza R, Adey A, Pliner HA, Christiansen L, Gunderson KL, et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015;348(6237):910–4.

[19]

Cusanovich DA, Hill AJ, Aghamirzaie D, Daza RM, Pliner HA, Berletch JB, et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell. 2018;174(5):1309–24.e18.

[20]

Preissl S, Fang R, Huang H, Zhao Y, Raviram R, Gorkin DU, et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat Neurosci. 2018;21(3):432–9.

[21]

Chen S, Yan G, Zhang W, Li J, Jiang R, Lin Z. RA3 is a reference-guided approach for epigenetic characterization of single cells. Nat Commun. 2021;12(1):2177.

[22]

Shengquan C, Boheng Z, Xiaoyang C, Xuegong Z, Rui J. StPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics. 2021;37(Supplement_1):I299–307.

[23]

Zhang Z, Chen S, Lin Z. RefTM: reference-guided topic modeling of single-cell chromatin accessibility data. Briefings Bioinf. 2023;24(1):bbac540.

[24]

Li Z, Chen X, Zhang X, Jiang R, Chen S. Latent feature extraction with a prior-based self-attention framework for spatial transcriptomics. Genome Res. 2023;33(10):1757–73.

[25]

Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P. Supervised contrastive learning. Advances in neural information processing systems. MITPress; Cambridge, MA: 2020.

[26]

Kornblith S, Norouzi M, Hinton G, editors. A simple framework for contrastive learning of visual representations. In: 37th International Conference on Machine Learning, ICML 2020. New York: PMLR; 2020.

[27]

Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. Piscataway, NJ: IEEE; 2020.

[28]

Perakis A, Gorji A, Jain S, Chaitanya K, Rizza S, Konukoglu E. Contrastive learning of single-cell phenotypic representations for treatment classification. Lect Notes Comput Sci. 2021: 565–75.

[29]

Yan X, Zheng R, Li M. GLOBE: a contrastive learning-based framework for integrating single-cell transcriptome datasets. Briefings Bioinf. 2022;23(5):bbac311.

[30]

Han W, Cheng Y, Chen J, Zhong H, Hu Z, Chen S, et al. Selfsupervised contrastive learning for integrative single cell RNA-seq data analysis. Briefings Bioinf. 2022;23(5):bbac377.

[31]

Ma W, Su K, Wu H. Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction. Genome Biol. 2021;22(1):264.

[32]

Xiong L, Xu K, Tian K, Shao Y, Tang L, Gao G, et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat Commun. 2019;10(1):4576.

[33]

Xiong L, Tian K, Li Y, Ning W, Gao X, Zhang QC. Online singlecell data integration through projecting heterogeneous datasets into acommoncell-embedding space. NatCommun.2022;13(1):6118.

[34]

Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9(1):5233.

[35]

Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJT, et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 2019;20(1):194.

[36]

Bravo González-Blas C, Minnoye L, Papasokrati D, Aibar S, Hulselmans G, Christiaens V, et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. Nat Methods. 2019;16(5):397–400.

[37]

Vieira S, Kaymak U, Sousa JM. Cohen’s kappa coefficient as a performance measure for feature selection. In: International conference on fuzzy systems. Piscataway, NJ: IEEE; 2010.

[38]

Kiselev VY, Yiu A, Hemberg M. Scmap: projection of single-cell RNA-seq data across data sets. Nat Methods. 2018;15(5): 359–62.

[39]

McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, et al. GREAT improves functional interpretation of cisregulatory regions. Nat Biotechnol. 2010;28(5):495–501.

[40]

Jakubzick CV, Randolph GJ, Henson PM. Monocyte differentiation and antigen-presenting functions. Nat Rev Immunol. 2017;17(6):349–62.

[41]

Patel AA, Ginhoux F, Yona S. Monocytes, macrophages, dendritic cells and neutrophils: an update on lifespan kinetics in health and disease. Immunology. 2021;163(3):250–61.

[42]

Colonna M. TREMs in the immune system and beyond. Nat Rev Immunol. 2003;3(6):445–53.

[43]

Fairfax BP, Humburg P, Makino S, Naranbhai V, Wong D, Lau E, et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science. 2014;343(6175):1246949.

[44]

Saeed S, Quintin J, Kerstens HHD, Rao NA, Aghajanirefah A, Matarese F, et al. Epigenetic programming of monocyte-tomacrophage differentiation and trained innate immunity. Science. 2014;345(6204):1251086.

[45]

Zamanighomi M, Lin Z, Daley T, Chen X, Duren Z, Schep A, et al. Unsupervised clustering and epigenetic classification of single cells. Nat Commun. 2018;9(1):2410.

[46]

Schep AN, Wu B, Buenrostro JD, Greenleaf WJ. ChromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods. 2017;14(10): 975–8.

[47]

Ji H, Ehrlich LIR, Seita J, Murakami P, Doi A, Lindau P, et al. Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature. 2010;467(7313):338–42.

[48]

Satoh T, Nakagawa K, Sugihara F, Kuwahara R, Ashihara M, Yamane F, et al. Identification of an atypical monocyte and committed progenitor involved in fibrosis. Nature. 2017;541(7635):96–101.

[49]

Iwasaki H, Mizuno SI, Wells RA, Cantor AB, Watanabe S, Akashi K. GATA-1 converts lymphoid and myelomonocytic progenitors into the megakaryocyte/erythrocyte lineages. Immunity. 2003;19(3):451–62.

[50]

Pimkin M, Kossenkov AV, Mishra T, Morrissey CS, Wu W, Keller CA, et al. Divergent functions of hematopoietic transcription factors in lineage priming and differentiation during erythromegakaryopoiesis. Genome Res. 2014;24(12):1932–44.

[51]

Webster B, Werneke SW, Zafirova B, This S, Coléon S, Décembre E, et al. Plasmacytoid dendritic cells control dengue and chikungunya virus infections via IRF7-regulated interferon responses. Elife. 2018;7:e34273.

[52]

Reizis B. Plasmacytoid dendritic cells: development, regulation, and function. Immunity. 2019;50(1):37–50.

[53]

Chen S, Wang R, Long W, Jiang R. ASTER: accurately estimating the number of cell types in single-cell chromatin accessibility data. Bioinformatics. 2023;39(1):btac842.

[54]

Zeng W, Chen S, Cui X, Chen X, Gao Z, Jiang R. SilencerDB: A comprehensive database of silencers. Nucleic Acids Res. 2021;49(D1):D221–8.

[55]

Chen S, Gan M, Lv H, Jiang R. DeepCAPE: a deep convolutional neural network for the accurate prediction of enhancers. Dev Reprod Biol. 2021;19(4):565–77.

[56]

Sun X, Lin X, Li Z, Wu H. A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq. Briefings Bioinf. 2022;23(2): bbab567.

[57]

Liu Q, Chen S, Jiang R, Wong WH. Simultaneous deep generative modelling and clustering of single-cell genomic data. Nat Mach Intell. 2021;3(6):536–44.

[58]

Zhang W, Jiang R, Chen S, Wang Y. scIBD: a self-supervised iterative-optimizing model for boosting the detection of heterotypic doublets in single-cell chromatin accessibility data. Genome Biol. 2023;24(1):225.

RIGHTS & PERMISSIONS

2024 The Authors. Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.

AI Summary AI Mindmap
PDF (2015KB)

718

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/