A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data

Yuwei Huang, Huidan Chang, Xiaoyi Chen, Jiayue Meng, Mengyao Han, Tao Huang, Liyun Yuan, Guoqing Zhang

Quant. Biol. ›› 2023, Vol. 11 ›› Issue (2) : 163-174.

PDF(7284 KB)
PDF(7284 KB)
Quant. Biol. ›› 2023, Vol. 11 ›› Issue (2) : 163-174. DOI: 10.15302/J-QB-022-0311
RESEARCH ARTICLE
RESEARCH ARTICLE

A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data

Author information +
History +

Abstract

Background: The precise and efficient analysis of single-cell transcriptome data provides powerful support for studying the diversity of cell functions at the single-cell level. The most important and challenging steps are cell clustering and recognition of cell populations. While the precision of clustering and annotation are considered separately in most current studies, it is worth attempting to develop an extensive and flexible strategy to balance clustering accuracy and biological explanation comprehensively.

Methods: The cell marker-based clustering strategy (cmCluster), which is a modified Louvain clustering method, aims to search the optimal clusters through genetic algorithm (GA) and grid search based on the cell type annotation results.

Results: By applying cmCluster on a set of single-cell transcriptome data, the results showed that it was beneficial for the recognition of cell populations and explanation of biological function even on the occasion of incomplete cell type information or multiple data resources. In addition, cmCluster also produced clear boundaries and appropriate subtypes with potential marker genes. The relevant code is available in GitHub website (huangyuwei301/cmCluster).

Conclusions: We speculate that cmCluster provides researchers effective screening strategies to improve the accuracy of subsequent biological analysis, reduce artificial bias, and facilitate the comparison and analysis of multiple studies.

Author summary

The importance and challenge of clustering method for scRNA-seq data is that recent methods introduce difficulty and bias in the identification of cell types and cell function explanation. We proposed a cell marker-based clustering strategy (cmCluster) to determine accurate clusters by introducing a knowledge benchmark during the fine adjustment of clustering. cmCluster helped to obtain finer cell populations for single-cell transcriptome data with both known and unknown cell type labels. And these populations will be suitable to identify cell types or acquire features especially for massive scRNA-seq data with complex or potential novel cell types.

Graphical abstract

Keywords

single-cell RNA-seq / clustering / cell markers / novel cell types

Cite this article

Download citation ▾
Yuwei Huang, Huidan Chang, Xiaoyi Chen, Jiayue Meng, Mengyao Han, Tao Huang, Liyun Yuan, Guoqing Zhang. A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data. Quant. Biol., 2023, 11(2): 163‒174 https://doi.org/10.15302/J-QB-022-0311

References

[1]
Gulati,G. S., Sikandar,S. S., Wesche,D. J., Manjunath,A., Bharadwaj,A., Berger,M. J., Ilagan,F., Kuo,A. H., Hsieh,R. W., Cai,S. . (2020). Single-cell transcriptional diversity is a hallmark of developmental potential. Science, 367: 405–411
CrossRef Google scholar
[2]
Tang,F., Barbacioru,C., Wang,Y., Nordman,E., Lee,C., Xu,N., Wang,X., Bodeau,J., Tuch,B. B., Siddiqui,A. . (2009). mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods, 6: 377–382
CrossRef Google scholar
[3]
Luecken,M. D. Theis,F. (2019). Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol., 15: e8746
CrossRef Google scholar
[4]
Svensson,V., Natarajan,K. N., Ly,L. H., Miragaia,R. J., Labalette,C., Macaulay,I. C., Cvejic,A. Teichmann,S. (2017). Power analysis of single-cell RNA-sequencing experiments. Nat. Methods, 14: 381–387
CrossRef Google scholar
[5]
Natarajan,K. N., Miao,Z., Jiang,M., Huang,X., Zhou,H., Xie,J., Wang,C., Qin,S., Zhao,Z., Wu,L. . (2019). Comparative analysis of sequencing technologies for single-cell transcriptomics. Genome Biol., 20: 70
CrossRef Google scholar
[6]
Chen,R., Wu,X., Jiang,L. (2017). Single-cell RNA-seq reveals hypothalamic cell diversity. Cell Rep., 18: 3227–3241
CrossRef Google scholar
[7]
Dulken,B. W., Buckley,M. T., Navarro Negredo,P., Saligrama,N., Cayrol,R., Leeman,D. S., George,B. M., Boutet,S. C., Hebestreit,K., Pluvinage,J. V. . (2019). Single-cell analysis reveals T cell infiltration in old neurogenic niches. Nature, 571: 205–210
CrossRef Google scholar
[8]
Stevens,W. W. Staudacher,A. G. Hulse,K. E. Carter,R. G. Winter,D. R. Kato,A. Suh,L. Norton,J. E. Huang,J. H. Peters,A. T. . (2021). Activation of the 15-lipoxygenase pathway in aspirin exacerbated respiratory disease. J. Allergy Clin. Immunol., 147: 600–612
CrossRef Google scholar
[9]
Yan,K. S., Gevaert,O., Zheng,G. X. Y., Anchang,B., Probert,C. S., Larkin,K. A., Davies,P. S., Cheng,Z. F., Kaddis,J. S., Han,A. . (2017). Intestinal enteroendocrine lineage cells possess homeostatic and injury-inducible stem cell activity. Cell Stem Cell, 21: 78–90.e6
CrossRef Google scholar
[10]
Regev,A., Teichmann,S. A., Lander,E. S., Amit,I., Benoist,C., Birney,E., Bodenmiller,B., Campbell,P., Carninci,P., Clatworthy,M. . (2017). The human cell atlas. eLife, 6: e27041
CrossRef Google scholar
[11]
Rozenblatt-Rosen,O., Regev,A., Oberdoerffer,P., Nawy,T., Hupalowska,A., Rood,J. E., Ashenberg,O., Cerami,E., Coffey,R. J., Demir,E. . (2020). The human tumor atlas network: charting tumor transitions across space and time at single-cell resolution. Cell, 181: 236–249
CrossRef Google scholar
[12]
Kiselev,V. Y., Andrews,T. S. (2019). Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet., 20: 273–282
CrossRef Google scholar
[13]
Abdelaal,T., Michielsen,L., Cats,D., Hoogduin,D., Mei,H., Reinders,M. J. T. (2019). A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol., 20: 194
CrossRef Google scholar
[14]
Stegle,O., Teichmann,S. A. Marioni,J. (2015). Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16: 133–145
CrossRef Google scholar
[15]
Bhat-Nakshatri,P., Gao,H., Sheng,L., McGuire,P. C., Xuei,X., Wan,J., Liu,Y., Althouse,S. K., Colter,A., Sandusky,G. . (2021). A single-cell atlas of the healthy breast tissues reveals clinically relevant clusters of breast epithelial cells. Cell Rep. Med., 2: 100219
CrossRef Google scholar
[16]
Donato,C., Kunz,L., Castro-Giner,F., Paasinen-Sohns,A., Strittmatter,K., Szczerba,B. M., Scherrer,R., Di Maggio,N., Heusermann,W., Biehlmaier,O. . (2020). Hypoxia triggers the intravasation of clustered circulating tumor cells. Cell Rep., 32: 108105
CrossRef Google scholar
[17]
Linderman,G. C., Rachh,M., Hoskins,J. G., Steinerberger,S. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat. Methods, 16: 243–245
CrossRef Google scholar
[18]
Zhu,X., Zhang,J., Xu,Y., Wang,J., Peng,X. Li,H. (2020). Single-cell clustering based on shared nearest neighbor and graph partitioning. Interdiscip. Sci., 12: 117–130
CrossRef Google scholar
[19]
BechtE.,McInnes L.,HealyJ.,DutertreC. A.,KwokI. W. H.,NgL. G.,GinhouxF.NewellE. W. (2018) Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol
[20]
Angerer,P., Haghverdi,L., ttner,M., Theis,F. J., Marr,C. (2016). destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics, 32: 1241–1243
CrossRef Google scholar
[21]
Butler,A., Hoffman,P., Smibert,P., Papalexi,E. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol., 36: 411–420
CrossRef Google scholar
[22]
Wolf,F. A., Angerer,P. Theis,F. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol., 19: 15
CrossRef Google scholar
[23]
Kiselev,V. Y., Kirschner,K., Schaub,M. T., Andrews,T., Yiu,A., Chandra,T., Natarajan,K. N., Reik,W., Barahona,M., Green,A. R. . (2017). SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods, 14: 483–486
CrossRef Google scholar
[24]
Wagner,J., Rapsomaniki,M. A., Chevrier,S., Anzeneder,T., Langwieder,C., Dykgers,A., Rees,M., Ramaswamy,A., Muenst,S., Soysal,S. D. . (2019). A single-cell atlas of the tumor and immune ecosystem of human breast cancer. Cell, 177: 1330–1345.e18
CrossRef Google scholar
[25]
Lin,P., Troup,M. Ho,J. (2017). CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol., 18: 59
CrossRef Google scholar
[26]
Shekhar,K. (2019). Identification of Cell Types from Single-Cell Transcriptomic Data. Methods Mol. Biol., 1935: 45–77
CrossRef Google scholar
[27]
Kim,T., Lo,K., Geddes,T. A., Kim,H. J., Yang,J. Y. H. (2019). scReClassify: post hoc cell type classification of single-cell rNA-seq data. BMC Genomics, 20: 913
CrossRef Google scholar
[28]
Lyubimova,A., Kester,L., Wiebrands,K., Basak,O., Sasaki,N., Clevers,H. (2015). Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature, 525: 251–255
CrossRef Google scholar
[29]
Wang,B., Ramazzotti,D., De Sano,L., Zhu,J., Pierson,E. (2018). SIMLR: A tool for large-scale genomic analyses by multi-kernel learning. Proteomics, 18: 18
CrossRef Google scholar
[30]
Jackson,H. W., Fischer,J. R., Zanotelli,V. R. T., Ali,H. R., Mechera,R., Soysal,S. D., Moch,H., Muenst,S., Varga,Z., Weber,W. P. . (2020). The single-cell pathology landscape of breast cancer. Nature, 578: 615–620
CrossRef Google scholar
[31]
Alquicira-Hernandez,J., Sathe,A., Ji,H. P., Nguyen,Q. Powell,J. (2019). scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol., 20: 264
CrossRef Google scholar
[32]
Zhang,X., Lan,Y., Xu,J., Quan,F., Zhao,E., Deng,C., Luo,T., Xu,L., Liao,G., Yan,M. . (2019). CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res., 47: D721–D728
CrossRef Google scholar
[33]
Zhang,A. W., Flanagan,C., Chavez,E. A., Lim,J. L. P., Ceglia,N., McPherson,A., Wiens,M., Walters,P., Chan,T., Hewitson,B. . (2019). Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods, 16: 1007–1015
CrossRef Google scholar
[34]
Sullivan,D. P., Winsnes,C. F., kesson,L., Hjelmare,M., Wiking,M., Schutten,R., Campbell,L., Leifsson,H., Rhodes,S., Nordgren,A. . (2018). Deep learning is combined with massive-scale citizen science to improve large-scale image classification. Nat. Biotechnol., 36: 820–828
CrossRef Google scholar
[35]
Yu,L., Cao,Y., Yang,J. Y. H. (2022). Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data. Genome Biol., 23: 49
CrossRef Google scholar
[36]
Hinterreiter,A., Ruch,P., Stitz,H., Ennemoser,M., Bernard,J., Strobelt,H. . (2020). ConfusionFlow: A model-agnostic visualization for temporal analysis of classifier confusion. IEEE Trans Vis Comput Graph, 28: 1222–1236
CrossRef Google scholar
[37]
Zheng,G. X., Terry,J. M., Belgrader,P., Ryvkin,P., Bent,Z. W., Wilson,R., Ziraldo,S. B., Wheeler,T. D., McDermott,G. P., Zhu,J. . (2017). Massively parallel digital transcriptional profiling of single cells. Nat. Commun., 8: 14049
CrossRef Google scholar
[38]
Ramachandran,P., Dobie,R., Wilson-Kanamori,J. R., Dora,E. F., Henderson,B. E. P., Luu,N. T., Portman,J. R., Matchett,K. P., Brice,M., Marwick,J. A. . (2019). Resolving the fibrotic niche of human liver cirrhosis at single-cell level. Nature, 575: 512–518
CrossRef Google scholar
[39]
Zeisel,A., Hochgerner,H., nnerberg,P., Johnsson,A., Memic,F., van der Zwan,J., ring,M., Braun,E., Borm,L. E., La Manno,G. . (2018). Molecular Architecture of the Mouse Nervous System. Cell, 174: 999–1014.e22
CrossRef Google scholar
[40]
Tabula,Muris Consortium Overall,coordination Logistical,coordination Organ,collection processing,preparation Library,data analysis sequencing,type annotation Computational,group Cell,text writing group Writing,Principal investigators (2018). Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature, 562: 367–372
CrossRef Google scholar
[41]
Pliner,H. A., Shendure,J. (2019). Supervised classification enables rapid annotation of cell atlases. Nat. Methods, 16: 983–986
CrossRef Google scholar
[42]
Wei,Z. (2021). CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data. Bioinformatics, 37(Suppl_1): i51–i58
CrossRef Google scholar
[43]
LameskiP.,Zdravevski E.,MingovR.. (2015). SVM parameter tuning with grid search and its impact on reduction of model over-fitting. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Lecture Notes in Computer Science, 9437. Springer, Cham

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/10.15302/J-QB-022-0311

ACKNOWLEDGMENTS

This work was supported by National Major Scientific Instrument and Equipment Development Project of NSFC (81827901), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB38030100 and XDB38050200), II Phase External Project of Ningbo Institute of Life and Health Industry, University of Chinese Academy of Sciences (2020YJY0217) and Shanghai Municipal Science and Technology Major Project (2017SHZDZX01).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Yuwei Huang, Huidan Chang, Xiaoyi Chen, Jiayue Meng, Mengyao Han, Tao Huang, Liyun Yuan and Guoqing Zhang declare that they have no conflicts of interest.
This article does not contain any studies with human or animal subjects performed by any of the authors. All procedures performed in studies were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2023 The Author (s). Published by Higher Education Press.
AI Summary AI Mindmap
PDF(7284 KB)

Accesses

Citations

Detail

Sections
Recommended

/