A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data

Yuwei Huang , Huidan Chang , Xiaoyi Chen , Jiayue Meng , Mengyao Han , Tao Huang , Liyun Yuan , Guoqing Zhang

Quant. Biol. ›› 2023, Vol. 11 ›› Issue (2) : 163 -174.

PDF (7284KB)
Quant. Biol. ›› 2023, Vol. 11 ›› Issue (2) : 163 -174. DOI: 10.15302/J-QB-022-0311
RESEARCH ARTICLE
RESEARCH ARTICLE

A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data

Author information +
History +
PDF (7284KB)

Abstract

Background: The precise and efficient analysis of single-cell transcriptome data provides powerful support for studying the diversity of cell functions at the single-cell level. The most important and challenging steps are cell clustering and recognition of cell populations. While the precision of clustering and annotation are considered separately in most current studies, it is worth attempting to develop an extensive and flexible strategy to balance clustering accuracy and biological explanation comprehensively.

Methods: The cell marker-based clustering strategy (cmCluster), which is a modified Louvain clustering method, aims to search the optimal clusters through genetic algorithm (GA) and grid search based on the cell type annotation results.

Results: By applying cmCluster on a set of single-cell transcriptome data, the results showed that it was beneficial for the recognition of cell populations and explanation of biological function even on the occasion of incomplete cell type information or multiple data resources. In addition, cmCluster also produced clear boundaries and appropriate subtypes with potential marker genes. The relevant code is available in GitHub website (huangyuwei301/cmCluster).

Conclusions: We speculate that cmCluster provides researchers effective screening strategies to improve the accuracy of subsequent biological analysis, reduce artificial bias, and facilitate the comparison and analysis of multiple studies.

Graphical abstract

Keywords

single-cell RNA-seq / clustering / cell markers / novel cell types

Cite this article

Download citation ▾
Yuwei Huang, Huidan Chang, Xiaoyi Chen, Jiayue Meng, Mengyao Han, Tao Huang, Liyun Yuan, Guoqing Zhang. A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data. Quant. Biol., 2023, 11(2): 163-174 DOI:10.15302/J-QB-022-0311

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Gulati,G. S., Sikandar,S. S., Wesche,D. J., Manjunath,A., Bharadwaj,A., Berger,M. J., Ilagan,F., Kuo,A. H., Hsieh,R. W., Cai,S. . (2020). Single-cell transcriptional diversity is a hallmark of developmental potential. Science, 367: 405–411

[2]

Tang,F., Barbacioru,C., Wang,Y., Nordman,E., Lee,C., Xu,N., Wang,X., Bodeau,J., Tuch,B. B., Siddiqui,A. . (2009). mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods, 6: 377–382

[3]

Luecken,M. D. Theis,F. (2019). Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol., 15: e8746

[4]

Svensson,V., Natarajan,K. N., Ly,L. H., Miragaia,R. J., Labalette,C., Macaulay,I. C., Cvejic,A. Teichmann,S. (2017). Power analysis of single-cell RNA-sequencing experiments. Nat. Methods, 14: 381–387

[5]

Natarajan,K. N., Miao,Z., Jiang,M., Huang,X., Zhou,H., Xie,J., Wang,C., Qin,S., Zhao,Z., Wu,L. . (2019). Comparative analysis of sequencing technologies for single-cell transcriptomics. Genome Biol., 20: 70

[6]

Chen,R., Wu,X., Jiang,L. (2017). Single-cell RNA-seq reveals hypothalamic cell diversity. Cell Rep., 18: 3227–3241

[7]

Dulken,B. W., Buckley,M. T., Navarro Negredo,P., Saligrama,N., Cayrol,R., Leeman,D. S., George,B. M., Boutet,S. C., Hebestreit,K., Pluvinage,J. V. . (2019). Single-cell analysis reveals T cell infiltration in old neurogenic niches. Nature, 571: 205–210

[8]

Stevens,W. W. Staudacher,A. G. Hulse,K. E. Carter,R. G. Winter,D. R. Kato,A. Suh,L. Norton,J. E. Huang,J. H. Peters,A. T. . (2021). Activation of the 15-lipoxygenase pathway in aspirin exacerbated respiratory disease. J. Allergy Clin. Immunol., 147: 600–612

[9]

Yan,K. S., Gevaert,O., Zheng,G. X. Y., Anchang,B., Probert,C. S., Larkin,K. A., Davies,P. S., Cheng,Z. F., Kaddis,J. S., Han,A. . (2017). Intestinal enteroendocrine lineage cells possess homeostatic and injury-inducible stem cell activity. Cell Stem Cell, 21: 78–90.e6

[10]

Regev,A., Teichmann,S. A., Lander,E. S., Amit,I., Benoist,C., Birney,E., Bodenmiller,B., Campbell,P., Carninci,P., Clatworthy,M. . (2017). The human cell atlas. eLife, 6: e27041

[11]

Rozenblatt-Rosen,O., Regev,A., Oberdoerffer,P., Nawy,T., Hupalowska,A., Rood,J. E., Ashenberg,O., Cerami,E., Coffey,R. J., Demir,E. . (2020). The human tumor atlas network: charting tumor transitions across space and time at single-cell resolution. Cell, 181: 236–249

[12]

Kiselev,V. Y., Andrews,T. S. (2019). Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet., 20: 273–282

[13]

Abdelaal,T., Michielsen,L., Cats,D., Hoogduin,D., Mei,H., Reinders,M. J. T. (2019). A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol., 20: 194

[14]

Stegle,O., Teichmann,S. A. Marioni,J. (2015). Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16: 133–145

[15]

Bhat-Nakshatri,P., Gao,H., Sheng,L., McGuire,P. C., Xuei,X., Wan,J., Liu,Y., Althouse,S. K., Colter,A., Sandusky,G. . (2021). A single-cell atlas of the healthy breast tissues reveals clinically relevant clusters of breast epithelial cells. Cell Rep. Med., 2: 100219

[16]

Donato,C., Kunz,L., Castro-Giner,F., Paasinen-Sohns,A., Strittmatter,K., Szczerba,B. M., Scherrer,R., Di Maggio,N., Heusermann,W., Biehlmaier,O. . (2020). Hypoxia triggers the intravasation of clustered circulating tumor cells. Cell Rep., 32: 108105

[17]

Linderman,G. C., Rachh,M., Hoskins,J. G., Steinerberger,S. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat. Methods, 16: 243–245

[18]

Zhu,X., Zhang,J., Xu,Y., Wang,J., Peng,X. Li,H. (2020). Single-cell clustering based on shared nearest neighbor and graph partitioning. Interdiscip. Sci., 12: 117–130

[19]

BechtE.,McInnes L.,HealyJ.,DutertreC. A.,KwokI. W. H.,NgL. G.,GinhouxF.NewellE. W. (2018) Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol

[20]

Angerer,P., Haghverdi,L., ttner,M., Theis,F. J., Marr,C. (2016). destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics, 32: 1241–1243

[21]

Butler,A., Hoffman,P., Smibert,P., Papalexi,E. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol., 36: 411–420

[22]

Wolf,F. A., Angerer,P. Theis,F. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol., 19: 15

[23]

Kiselev,V. Y., Kirschner,K., Schaub,M. T., Andrews,T., Yiu,A., Chandra,T., Natarajan,K. N., Reik,W., Barahona,M., Green,A. R. . (2017). SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods, 14: 483–486

[24]

Wagner,J., Rapsomaniki,M. A., Chevrier,S., Anzeneder,T., Langwieder,C., Dykgers,A., Rees,M., Ramaswamy,A., Muenst,S., Soysal,S. D. . (2019). A single-cell atlas of the tumor and immune ecosystem of human breast cancer. Cell, 177: 1330–1345.e18

[25]

Lin,P., Troup,M. Ho,J. (2017). CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol., 18: 59

[26]

Shekhar,K. (2019). Identification of Cell Types from Single-Cell Transcriptomic Data. Methods Mol. Biol., 1935: 45–77

[27]

Kim,T., Lo,K., Geddes,T. A., Kim,H. J., Yang,J. Y. H. (2019). scReClassify: post hoc cell type classification of single-cell rNA-seq data. BMC Genomics, 20: 913

[28]

Lyubimova,A., Kester,L., Wiebrands,K., Basak,O., Sasaki,N., Clevers,H. (2015). Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature, 525: 251–255

[29]

Wang,B., Ramazzotti,D., De Sano,L., Zhu,J., Pierson,E. (2018). SIMLR: A tool for large-scale genomic analyses by multi-kernel learning. Proteomics, 18: 18

[30]

Jackson,H. W., Fischer,J. R., Zanotelli,V. R. T., Ali,H. R., Mechera,R., Soysal,S. D., Moch,H., Muenst,S., Varga,Z., Weber,W. P. . (2020). The single-cell pathology landscape of breast cancer. Nature, 578: 615–620

[31]

Alquicira-Hernandez,J., Sathe,A., Ji,H. P., Nguyen,Q. Powell,J. (2019). scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol., 20: 264

[32]

Zhang,X., Lan,Y., Xu,J., Quan,F., Zhao,E., Deng,C., Luo,T., Xu,L., Liao,G., Yan,M. . (2019). CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res., 47: D721–D728

[33]

Zhang,A. W., Flanagan,C., Chavez,E. A., Lim,J. L. P., Ceglia,N., McPherson,A., Wiens,M., Walters,P., Chan,T., Hewitson,B. . (2019). Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods, 16: 1007–1015

[34]

Sullivan,D. P., Winsnes,C. F., kesson,L., Hjelmare,M., Wiking,M., Schutten,R., Campbell,L., Leifsson,H., Rhodes,S., Nordgren,A. . (2018). Deep learning is combined with massive-scale citizen science to improve large-scale image classification. Nat. Biotechnol., 36: 820–828

[35]

Yu,L., Cao,Y., Yang,J. Y. H. (2022). Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data. Genome Biol., 23: 49

[36]

Hinterreiter,A., Ruch,P., Stitz,H., Ennemoser,M., Bernard,J., Strobelt,H. . (2020). ConfusionFlow: A model-agnostic visualization for temporal analysis of classifier confusion. IEEE Trans Vis Comput Graph, 28: 1222–1236

[37]

Zheng,G. X., Terry,J. M., Belgrader,P., Ryvkin,P., Bent,Z. W., Wilson,R., Ziraldo,S. B., Wheeler,T. D., McDermott,G. P., Zhu,J. . (2017). Massively parallel digital transcriptional profiling of single cells. Nat. Commun., 8: 14049

[38]

Ramachandran,P., Dobie,R., Wilson-Kanamori,J. R., Dora,E. F., Henderson,B. E. P., Luu,N. T., Portman,J. R., Matchett,K. P., Brice,M., Marwick,J. A. . (2019). Resolving the fibrotic niche of human liver cirrhosis at single-cell level. Nature, 575: 512–518

[39]

Zeisel,A., Hochgerner,H., nnerberg,P., Johnsson,A., Memic,F., van der Zwan,J., ring,M., Braun,E., Borm,L. E., La Manno,G. . (2018). Molecular Architecture of the Mouse Nervous System. Cell, 174: 999–1014.e22

[40]

Tabula,Muris Consortium Overall,coordination Logistical,coordination Organ,collection processing,preparation Library,data analysis sequencing,type annotation Computational,group Cell,text writing group Writing,Principal investigators (2018). Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature, 562: 367–372

[41]

Pliner,H. A., Shendure,J. (2019). Supervised classification enables rapid annotation of cell atlases. Nat. Methods, 16: 983–986

[42]

Wei,Z. (2021). CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data. Bioinformatics, 37(Suppl_1): i51–i58

[43]

LameskiP.,Zdravevski E.,MingovR.. (2015). SVM parameter tuning with grid search and its impact on reduction of model over-fitting. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Lecture Notes in Computer Science, 9437. Springer, Cham

RIGHTS & PERMISSIONS

The Author (s). Published by Higher Education Press.

AI Summary AI Mindmap
PDF (7284KB)

Supplementary files

QB-22311-OF-YLY_suppl_1

2356

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/