
A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data
Yuwei Huang, Huidan Chang, Xiaoyi Chen, Jiayue Meng, Mengyao Han, Tao Huang, Liyun Yuan, Guoqing Zhang
Quant. Biol. ›› 2023, Vol. 11 ›› Issue (2) : 163-174.
A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data
Background: The precise and efficient analysis of single-cell transcriptome data provides powerful support for studying the diversity of cell functions at the single-cell level. The most important and challenging steps are cell clustering and recognition of cell populations. While the precision of clustering and annotation are considered separately in most current studies, it is worth attempting to develop an extensive and flexible strategy to balance clustering accuracy and biological explanation comprehensively.
Methods: The cell marker-based clustering strategy (cmCluster), which is a modified Louvain clustering method, aims to search the optimal clusters through genetic algorithm (GA) and grid search based on the cell type annotation results.
Results: By applying cmCluster on a set of single-cell transcriptome data, the results showed that it was beneficial for the recognition of cell populations and explanation of biological function even on the occasion of incomplete cell type information or multiple data resources. In addition, cmCluster also produced clear boundaries and appropriate subtypes with potential marker genes. The relevant code is available in GitHub website (huangyuwei301/cmCluster).
Conclusions: We speculate that cmCluster provides researchers effective screening strategies to improve the accuracy of subsequent biological analysis, reduce artificial bias, and facilitate the comparison and analysis of multiple studies.
The importance and challenge of clustering method for scRNA-seq data is that recent methods introduce difficulty and bias in the identification of cell types and cell function explanation. We proposed a cell marker-based clustering strategy (cmCluster) to determine accurate clusters by introducing a knowledge benchmark during the fine adjustment of clustering. cmCluster helped to obtain finer cell populations for single-cell transcriptome data with both known and unknown cell type labels. And these populations will be suitable to identify cell types or acquire features especially for massive scRNA-seq data with complex or potential novel cell types.
single-cell RNA-seq / clustering / cell markers / novel cell types
[1] |
Gulati,G. S., Sikandar,S. S., Wesche,D. J., Manjunath,A., Bharadwaj,A., Berger,M. J., Ilagan,F., Kuo,A. H., Hsieh,R. W., Cai,S.
CrossRef
Google scholar
|
[2] |
Tang,F., Barbacioru,C., Wang,Y., Nordman,E., Lee,C., Xu,N., Wang,X., Bodeau,J., Tuch,B. B., Siddiqui,A.
CrossRef
Google scholar
|
[3] |
Luecken,M. D. Theis,F. (2019). Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol., 15: e8746
CrossRef
Google scholar
|
[4] |
Svensson,V., Natarajan,K. N., Ly,L. H., Miragaia,R. J., Labalette,C., Macaulay,I. C., Cvejic,A. Teichmann,S. (2017). Power analysis of single-cell RNA-sequencing experiments. Nat. Methods, 14: 381–387
CrossRef
Google scholar
|
[5] |
Natarajan,K. N., Miao,Z., Jiang,M., Huang,X., Zhou,H., Xie,J., Wang,C., Qin,S., Zhao,Z., Wu,L.
CrossRef
Google scholar
|
[6] |
Chen,R., Wu,X., Jiang,L. (2017). Single-cell RNA-seq reveals hypothalamic cell diversity. Cell Rep., 18: 3227–3241
CrossRef
Google scholar
|
[7] |
Dulken,B. W., Buckley,M. T., Navarro Negredo,P., Saligrama,N., Cayrol,R., Leeman,D. S., George,B. M., Boutet,S. C., Hebestreit,K., Pluvinage,J. V.
CrossRef
Google scholar
|
[8] |
Stevens,W. W. Staudacher,A. G. Hulse,K. E. Carter,R. G. Winter,D. R. Kato,A. Suh,L. Norton,J. E. Huang,J. H. Peters,A. T.
CrossRef
Google scholar
|
[9] |
Yan,K. S., Gevaert,O., Zheng,G. X. Y., Anchang,B., Probert,C. S., Larkin,K. A., Davies,P. S., Cheng,Z. F., Kaddis,J. S., Han,A.
CrossRef
Google scholar
|
[10] |
Regev,A., Teichmann,S. A., Lander,E. S., Amit,I., Benoist,C., Birney,E., Bodenmiller,B., Campbell,P., Carninci,P., Clatworthy,M.
CrossRef
Google scholar
|
[11] |
Rozenblatt-Rosen,O., Regev,A., Oberdoerffer,P., Nawy,T., Hupalowska,A., Rood,J. E., Ashenberg,O., Cerami,E., Coffey,R. J., Demir,E.
CrossRef
Google scholar
|
[12] |
Kiselev,V. Y., Andrews,T. S. (2019). Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet., 20: 273–282
CrossRef
Google scholar
|
[13] |
Abdelaal,T., Michielsen,L., Cats,D., Hoogduin,D., Mei,H., Reinders,M. J. T. (2019). A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol., 20: 194
CrossRef
Google scholar
|
[14] |
Stegle,O., Teichmann,S. A. Marioni,J. (2015). Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16: 133–145
CrossRef
Google scholar
|
[15] |
Bhat-Nakshatri,P., Gao,H., Sheng,L., McGuire,P. C., Xuei,X., Wan,J., Liu,Y., Althouse,S. K., Colter,A., Sandusky,G.
CrossRef
Google scholar
|
[16] |
Donato,C., Kunz,L., Castro-Giner,F., Paasinen-Sohns,A., Strittmatter,K., Szczerba,B. M., Scherrer,R., Di Maggio,N., Heusermann,W., Biehlmaier,O.
CrossRef
Google scholar
|
[17] |
Linderman,G. C., Rachh,M., Hoskins,J. G., Steinerberger,S. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat. Methods, 16: 243–245
CrossRef
Google scholar
|
[18] |
Zhu,X., Zhang,J., Xu,Y., Wang,J., Peng,X. Li,H. (2020). Single-cell clustering based on shared nearest neighbor and graph partitioning. Interdiscip. Sci., 12: 117–130
CrossRef
Google scholar
|
[19] |
BechtE.,McInnes L.,HealyJ.,DutertreC. A.,KwokI. W. H.,NgL. G.,GinhouxF.NewellE. W. (2018) Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol
|
[20] |
Angerer,P., Haghverdi,L., ttner,M., Theis,F. J., Marr,C. (2016). destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics, 32: 1241–1243
CrossRef
Google scholar
|
[21] |
Butler,A., Hoffman,P., Smibert,P., Papalexi,E. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol., 36: 411–420
CrossRef
Google scholar
|
[22] |
Wolf,F. A., Angerer,P. Theis,F. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol., 19: 15
CrossRef
Google scholar
|
[23] |
Kiselev,V. Y., Kirschner,K., Schaub,M. T., Andrews,T., Yiu,A., Chandra,T., Natarajan,K. N., Reik,W., Barahona,M., Green,A. R.
CrossRef
Google scholar
|
[24] |
Wagner,J., Rapsomaniki,M. A., Chevrier,S., Anzeneder,T., Langwieder,C., Dykgers,A., Rees,M., Ramaswamy,A., Muenst,S., Soysal,S. D.
CrossRef
Google scholar
|
[25] |
Lin,P., Troup,M. Ho,J. (2017). CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol., 18: 59
CrossRef
Google scholar
|
[26] |
Shekhar,K. (2019). Identification of Cell Types from Single-Cell Transcriptomic Data. Methods Mol. Biol., 1935: 45–77
CrossRef
Google scholar
|
[27] |
Kim,T., Lo,K., Geddes,T. A., Kim,H. J., Yang,J. Y. H. (2019). scReClassify: post hoc cell type classification of single-cell rNA-seq data. BMC Genomics, 20: 913
CrossRef
Google scholar
|
[28] |
Lyubimova,A., Kester,L., Wiebrands,K., Basak,O., Sasaki,N., Clevers,H. (2015). Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature, 525: 251–255
CrossRef
Google scholar
|
[29] |
Wang,B., Ramazzotti,D., De Sano,L., Zhu,J., Pierson,E. (2018). SIMLR: A tool for large-scale genomic analyses by multi-kernel learning. Proteomics, 18: 18
CrossRef
Google scholar
|
[30] |
Jackson,H. W., Fischer,J. R., Zanotelli,V. R. T., Ali,H. R., Mechera,R., Soysal,S. D., Moch,H., Muenst,S., Varga,Z., Weber,W. P.
CrossRef
Google scholar
|
[31] |
Alquicira-Hernandez,J., Sathe,A., Ji,H. P., Nguyen,Q. Powell,J. (2019). scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol., 20: 264
CrossRef
Google scholar
|
[32] |
Zhang,X., Lan,Y., Xu,J., Quan,F., Zhao,E., Deng,C., Luo,T., Xu,L., Liao,G., Yan,M.
CrossRef
Google scholar
|
[33] |
Zhang,A. W., Flanagan,C., Chavez,E. A., Lim,J. L. P., Ceglia,N., McPherson,A., Wiens,M., Walters,P., Chan,T., Hewitson,B.
CrossRef
Google scholar
|
[34] |
Sullivan,D. P., Winsnes,C. F., kesson,L., Hjelmare,M., Wiking,M., Schutten,R., Campbell,L., Leifsson,H., Rhodes,S., Nordgren,A.
CrossRef
Google scholar
|
[35] |
Yu,L., Cao,Y., Yang,J. Y. H. (2022). Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data. Genome Biol., 23: 49
CrossRef
Google scholar
|
[36] |
Hinterreiter,A., Ruch,P., Stitz,H., Ennemoser,M., Bernard,J., Strobelt,H.
CrossRef
Google scholar
|
[37] |
Zheng,G. X., Terry,J. M., Belgrader,P., Ryvkin,P., Bent,Z. W., Wilson,R., Ziraldo,S. B., Wheeler,T. D., McDermott,G. P., Zhu,J.
CrossRef
Google scholar
|
[38] |
Ramachandran,P., Dobie,R., Wilson-Kanamori,J. R., Dora,E. F., Henderson,B. E. P., Luu,N. T., Portman,J. R., Matchett,K. P., Brice,M., Marwick,J. A.
CrossRef
Google scholar
|
[39] |
Zeisel,A., Hochgerner,H., nnerberg,P., Johnsson,A., Memic,F., van der Zwan,J., ring,M., Braun,E., Borm,L. E., La Manno,G.
CrossRef
Google scholar
|
[40] |
Tabula,Muris Consortium Overall,coordination Logistical,coordination Organ,collection processing,preparation Library,data analysis sequencing,type annotation Computational,group Cell,text writing group Writing,Principal investigators (2018). Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature, 562: 367–372
CrossRef
Google scholar
|
[41] |
Pliner,H. A., Shendure,J. (2019). Supervised classification enables rapid annotation of cell atlases. Nat. Methods, 16: 983–986
CrossRef
Google scholar
|
[42] |
Wei,Z. (2021). CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data. Bioinformatics, 37(Suppl_1): i51–i58
CrossRef
Google scholar
|
[43] |
LameskiP.,Zdravevski E.,MingovR.. (2015). SVM parameter tuning with grid search and its impact on reduction of model over-fitting. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Lecture Notes in Computer Science, 9437. Springer, Cham
|
/
〈 |
|
〉 |