scSCC: A swapped contrastive learning-based clustering method for single-cell gene expression data

Xiang Wang , Sansheng Yang , Hongwei Li

Quant. Biol. ›› 2025, Vol. 13 ›› Issue (2) : e85

PDF (2605KB)
Quant. Biol. ›› 2025, Vol. 13 ›› Issue (2) : e85 DOI: 10.1002/qub2.85
RESEARCH ARTICLE

scSCC: A swapped contrastive learning-based clustering method for single-cell gene expression data

Author information +
History +
PDF (2605KB)

Abstract

Cell clustering plays a pivotal role in deciphering the intricacies of cell types, facilitating subsequent cell annotation endeavors within scRNA-seq data analysis. In this paper, we propose a novel swapped contrastive clustering algorithm for scRNA-seq data called scSCC. scSCC combines two contrastive learning modules, namely the instance contrastive learning module and the swapped prediction module, to extract clustering-friendly cell representations. Through the combination of swapped prediction module and instance contrastive learning module, scSCC can retrieve disentangled cell representations and amplify the clustering signals in the latent space, leading to satisfactory clustering performance. Different from existing contrastive-learning-based scRNA-seq data clustering algorithms, the swapped prediction module of scSCC injects clustering signals to the latent space through some clustering prototypes. The swapped prediction module encourages cells of the same cluster to gravitate toward the common clustering prototype and naturally stay away from other prototypes in the latent space, hence cell representations obtained by scSCC are more clustering-friendly compared to other algorithms. Experimental results on real scRNA-seq datasets show that scSCC achieves improved clustering performance compared with the benchmark methods. The ablation study on two contrastive modules exhibits the promotion by the combination of instance learning module and swapped prediction module. The source codes are available at the GitHub website (EnchantedJoy/scSCC).

Keywords

single-cell RNA-seq / clustering / contrastive learning / swapped prediction

Cite this article

Download citation ▾
Xiang Wang, Sansheng Yang, Hongwei Li. scSCC: A swapped contrastive learning-based clustering method for single-cell gene expression data. Quant. Biol., 2025, 13(2): e85 DOI:10.1002/qub2.85

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Haas BJ , Zody MC . Advancing RNA-seq analysis. Nat Biotechnol. 2010; 28 (5): 421- 3.

[2]

Saliba AE , Westermann AJ , Gorski SA , Vogel J . Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 2014; 42 (14): 8845- 60.

[3]

Conesa A , Madrigal P , Tarazona S , Gomez-Cabrero D , Cervera A , McPherson A , et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016; 17 (1): 1- 19.

[4]

Bellman R . Dynamic programming. Science. 1966; 153 (3731): 34- 7.

[5]

Pearson KLIII . On lines and planes of closest fit to systems of points in space. London, Edinburgh Dublin Phil Mag J Sci. 1901; 2 (11): 559- 72.

[6]

Lloyd S . Least squares quantization in PCM. IEEE Trans Inf Theor. 1982; 28 (2): 129- 37.

[7]

Blondel VD , Guillaume JL , Lambiotte R , Lefebvre E . Fast unfolding of communities in large networks. J Stat Mech Theor Exp. 2008; 2008 (10): P10008.

[8]

Stuart T , Butler A , Hoffman P , Hafemeister C , Papalexi E , Mauck WM, III , et al. Comprehensive integration of single-cell data. Cell. 2019; 177 (7): 1888- 902.

[9]

Lin P , Troup M , Ho JW . CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol. 2017; 18 (1): 1- 11.

[10]

Johnson SC . Hierarchical clustering schemes. Psychometrika. 1967; 32 (3): 241- 54.

[11]

Wang B , Zhu J , Pierson E , Ramazzotti D , Batzoglou S . Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods. 2017; 14 (4): 414- 6.

[12]

Von Luxburg U . A tutorial on spectral clustering. Stat Comput. 2007; 17 (4): 395- 416.

[13]

Eraslan G , Simon LM , Mircea M , Mueller NS , Theis FJ . Single-cell RNA-seq denoising using a deep count autoencoder. Nat Commun. 2019; 10 (1): 1- 14.

[14]

Chen L , Wang W , Zhai Y , Deng M . Deep soft K-means clustering with self-training for single-cell RNA sequence data. NAR Genomics and Bioinformatics. 2020; 2 (2): lqaa039.

[15]

Bezdek JC , Ehrlich R , Full W . FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984; 10 (2-3): 191- 203.

[16]

Xie J , Girshick R , Farhadi A . Unsupervised deep embedding for clustering analysis. In: International conference on machine learning. New York: PMLR; 2016. p. 478- 87.

[17]

He X , Qian K , Wang Z , Zeng S , Li H , Li WV . scAce: an adaptive embedding and clustering method for single-cell gene expression data. Bioinformatics. 2023; 39 (9): btad546.

[18]

Zhou J , Cui G , Hu S , Zhang Z , Yang C , Liu Z , et al. Graph neural networks: a review of methods and applications. AI Open. 2020; 1 (1): 57- 81.

[19]

Wu Z , Pan S , Chen F , Long G , Zhang C , Yu PS . A comprehensive survey on graph neural networks. IEEE Transact Neural Networks Learn Syst. 2020; 32 (1): 4- 24.

[20]

Ciortan M , Defrance M . GNN-based embedding for clustering scRNA-seq data. Bioinformatics. 2022; 38 (4): 1037- 44.

[21]

Chen T , Kornblith S , Norouzi M , Hinton G . A simple framework for contrastive learning of visual representations. In: International conference on machine learning. New York: PMLR; 2020. p. 1597- 607.

[22]

He K , Fan H , Wu Y , et al. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Piscataway: IEEE; 2020. p. 9729- 38.

[23]

Caron M , Misra I , Mairal J , et al. Unsupervised learning of visual features by contrasting cluster assignments. Adv Neural Inf Process Syst. 2020; 33 (1): 9912- 24.

[24]

Han W , Cheng Y , Chen J , Zhong H , Hu Z , Chen S , et al. Self-supervised contrastive learning for integrative single cell RNA-seq data analysis. Briefings Bioinf. 2022; 23 (5): bbac377.

[25]

Ciortan M , Defrance M . Contrastive self-supervised clustering of scRNA-seq data. BMC Bioinf. 2021; 22 (1): 280.

[26]

Oord A , Li Y , Vinyals O . Representation learning with contrastive predictive coding. 2018. Preprint at arXiv: 180703748.

[27]

Wan H , Chen L , Deng M . scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data. Bioinformatics. 2022; 38 (6): 1575- 83.

[28]

Lee J , Kim S , Hyun D , Lee N , Kim Y , Park C . Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning. Bioinformatics. 2023; 39 (6): btad342.

[29]

Du L , Han R , Liu B , Wang Y , Li J . ScCCL: single-cell data clustering based on self-supervised contrastive learning. IEEE ACM Trans Comput Biol Bioinf. 2023; 20 (3): 2233- 41.

[30]

Van der Maaten L , Hinton G . Visualizing data using t-SNE. J Mach Learn Res. 2008; 9 (11): 2579- 605.

[31]

Muraro MJ , Dharmadhikari G , Grü nD , Groen N , Dielen T , Jansen E , et al. A single-cell transcriptome atlas of the human pancreas. Cell Systems. 2016; 3 (4): 385- 94.

[32]

Adam M , Potter AS , Potter SS . Psychrophilic proteases dramatically reduce single-cell RNA-seq artifacts: a molecular atlas of kidney development. Development. 2017; 144 (19): 3625- 32.

[33]

Young MD , Mitchell TJ , Vieira Braga FA , Tran MGB , Stewart BJ , Ferdinand JR , et al. Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. Science. 2018; 361 (6402): 594- 9.

[34]

Baron M , Veres A , Wolock SL , Faust A , Gaujoux R , Vetere A , et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell Systems. 2016; 3 (4): 346- 60.

[35]

Macosko EZ , Basu A , Satija R , Nemesh J , Shekhar K , Goldman M , et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015; 161 (5): 1202- 14.

[36]

Tosches MA , Yamawaki TM , Naumann RK , Jacobi AA , Tushev G , Laurent G . Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles. Science. 2018; 360 (6391): 881- 8.

[37]

Bach K , Pensa S , Grzelak M , Hadfield J , Adams DJ , Marioni JC , et al. Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing. Nat Commun. 2017; 8 (1): 1- 11.

[38]

Organ collection and processing , Library preparation and sequencing , Computational data analysis , Schaum N , Karkanias J , Neff NF , et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018; 562 (1): 367- 72.

[39]

Hubert L , Arabie P . Comparing partitions. J Classif. 1985; 2 (1): 193- 218.

[40]

Strehl A , Ghosh J . Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res. 2002; 3: 583- 617.

[41]

Li T , Qian K , Wang X , Li WV , Li H . scBiG for representation learning of single-cell gene expression data based on bipartite graph embedding. NAR Genomics and Bioinformatics. 2024; 6 (1): 280.

[42]

Rousseeuw PJ . Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987; 20: 53- 65.

[43]

Davies DL , Bouldin DW . A cluster separation measure. IEEE Trans Pattern Anal Mach Intell. 1979; 1 (2): 224- 7.

[44]

Wolf FA , Angerer P , Theis FJ . SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018; 19: 1- 5.

[45]

Knight PA . The Sinkhorn-Knopp algorithm: convergence and applications. SIAM J Matrix Anal Appl. 2008; 30 (1): 261- 75.

[46]

Tian T , Zhang J , Lin X , Wei Z , Hakonarson H . Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data. Nat Commun. 2021; 12 (1): 1873.

RIGHTS & PERMISSIONS

The Author(s). Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.

AI Summary AI Mindmap
PDF (2605KB)

456

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/