Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT

Ruojin Yan , Chunmei Fan , Shen Gu , Tingzhang Wang , Zi Yin , Xiao Chen

Protein Cell ›› 2025, Vol. 16 ›› Issue (8) : 685 -704.

PDF (6903KB)
Protein Cell ›› 2025, Vol. 16 ›› Issue (8) : 685 -704. DOI: 10.1093/procel/pwaf001
RESEARCH ARTICLE

Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT

Author information +
History +
PDF (6903KB)

Abstract

Identification of disease-specific cell subtypes (DSCSs) has profound implications for understanding disease mechanisms, preoperative diagnosis, and precision therapy. However, achieving unified annotation of DSCSs in heterogeneous single-cell datasets remains a challenge. In this study, we developed the gPRINT algorithm (generalized approach for cell subtype identification with single cell’s voicePRINT). Inspired by the principles of speech recognition in noisy environments, gPRINT transforms gene position and gene expression information into voiceprints based on ordered and clustered gene expression phenomena, obtaining unique “gene print” patterns for each cell. Then, we integrated neural networks to mitigate the impact of background noise on cell identity label mapping. We demonstrated the reproducibility of gPRINT across different donors, single-cell sequencing platforms, and disease subtypes, and its utility for automatic cell subtype annotation across datasets. Moreover, gPRINT achieved higher annotation accuracy of 98.37% when externally validated based on the same tissue, surpassing other algorithms. Furthermore, this approach has been applied to fibrosis-associated diseases in multiple tissues throughout the body, as well as to the annotation of fibroblast subtypes in a single tissue, tendon, where fibrosis is prevalent. We successfully achieved automatic prediction of tendinopathy-specific cell subtypes, key targets, and related drugs. In summary, gPRINT provides an automated and unified approach for identifying DSCSs across datasets, facilitating the elucidation of specific cell subtypes under different disease states and providing a powerful tool for exploring therapeutic targets in diseases.

Keywords

disease-specific cell subtypes (DSCSs) / gPRINT / cell subtypes annotation / single-cell transcriptomics / fibrosis / tendinopathy

Cite this article

Download citation ▾
Ruojin Yan, Chunmei Fan, Shen Gu, Tingzhang Wang, Zi Yin, Xiao Chen. Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT. Protein Cell, 2025, 16(8): 685-704 DOI:10.1093/procel/pwaf001

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Abdelaal T, Michielsen L, Cats D et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol 2019;20:194.

[2]

Aggarwal S, Wang Z, Rincon Fernandez Pacheco D et al. SOX9 switch links regeneration to fibrosis at the single-cell level in mammalian kidneys. Science 2024;383:eadd6371.

[3]

Alquicira-Hernandez J, Sathe A, Ji HP et al. scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol 2019;20:264.

[4]

Aran D, Looney AP, Liu L et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol 2019;20:163–172.

[5]

Bai Z, Zhang XL. Speaker recognition based on deep learning: an overview. Neural Netw 2021;140:65–99.

[6]

Baldwin M, Buckley CD, Guilak F et al. A roadmap for delivering a human musculoskeletal cell atlas. Nat Rev Rheumatol 2023;19:738–752.

[7]

Baron M, Veres A, Wolock SL et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst 2016;3:346–360.e4.

[8]

Boufea K, Seth S, Batada NN. scID uses discriminant analysis to identify transcriptionally equivalent cell types across single-cell RNA-Seq data with batch effect. iScience 2020;23:100914.

[9]

Buechler MB, Pradhan RN, Krishnamurty AT et al. Cross-tissue organization of the fibroblast lineage. Nature 2021;593:575–579.

[10]

Camp JG, Badsha F, Florio M et al. Human cerebral organoids recapitulate gene expression programs of fetal neocortex development. Proc Natl Acad Sci U S A 2015;112:15672–15677.

[11]

Cao Y, Wang X, Peng G. SCSA: a cell type annotation tool for single-cell RNA-seq data. Front Genet 2020;11:490.

[12]

Chen J, Benesty J, Huang Y et al. New insights into the noise reduction Wiener filter. IEEE Trans Audio Speech Lang Process 2006;14:1218–1234.

[13]

Cortal A, Martignetti L, Six E et al. Gene signature extraction and cell identity recognition at the single-cell level with Cell-ID. Nat Biotechnol 2021;39:1095–1102.

[14]

de Kanter JK, Lijnzaad P, Candelli T et al. a selective, hierarchical cell type identification method for single-cell RNA sequencing. Nucleic Acids Res 2019;47:e95.

[15]

Docheva D, Müller SA, Majewski M et al. Biologics for tendon repair. Adv Drug Deliv Rev 2015;84:222–239.

[16]

Franzén O, Gan LM, Björkegren JLM. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford) 2019;2019:baz046.

[17]

Fu W, Yang R, Li J. Single-cell and spatial transcriptomics reveal changes in cell heterogeneity during progression of human tendinopathy. BMC Biol 2023;21:132.

[18]

Galdos FX, Xu S, Goodyer WR et al. devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data. Nat Commun 2022;13:5271.

[19]

Han X, Wang R, Zhou Y et al. Mapping the mouse cell atlas by microwell-Seq. Cell 2018;173:1307.

[20]

Han X, Zhou Z, Fei L et al. Construction of a human cell landscape at single-cell level. Nature 2020;581:303–309.

[21]

Hanley CJ, Waise S, Ellis MJ et al. Single-cell analysis reveals prognostic fibroblast subpopulations linked to molecular and immunological subtypes of lung cancer. Nat Commun 2023;14:387.

[22]

Henderson NC, Rieder F, Wynn TA. Fibrosis: from mechanisms to medicines. Nature 2020;587:555–566.

[23]

Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science 2006;313:504–507.

[24]

Huang B, Chen Z, Geng L et al. Mucosal profiling of pediatric-onset colitis and IBD reveals common pathogenics and therapeutic pathways. Cell 2019;179:1160–1176.e24.

[25]

Ianevski A, Giri AK, Aittokallio T. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nat Commun 2022;13:1246.

[26]

Kendal AR, Layton T, Al-Mossawi H et al. Multi-omic single cell analysis resolves novel stromal cell populations in healthy and diseased human tendon. Sci Rep 2020;10:13939.

[27]

Kersta LG. Voiceprint identification. Nature 1962;196:1253–1257.

[28]

Kim T, Chen IR, Lin Y et al. Impact of similarity metrics on single-cell RNA-seq data clustering. Brief Bioinform 2019;20:2316–2326.

[29]

Kiselev VY, Yiu A, Hemberg M. scMap: projection of single-cell RNA-seq data across data sets. Nat Methods 2018;15:359–362.

[30]

Li H, Courtois ET, Sengupta D et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat Genet 2017;49:708–718.

[31]

Ma F, Pellegrini M. ACTINN: automated identification of cell types in single cell RNA sequencing. Bioinformatics 2020;36:533–538.

[32]

Muraro MJ, Dharmadhikari G, Grün D et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst 2016;3:385–394.e3.

[33]

Mutsaers HAM, Merrild C, Nørregaard R et al. The impact of fibrotic diseases on global mortality from 1990 to 2019. J Transl Med 2023;21:818.

[34]

Pliner HA, Shendure J, Trapnell C. Supervised classification enables rapid annotation of cell atlases. Nat Methods 2019;16:983–986.

[35]

Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004 (ICPR 2004). IEEE 2004;3:32–36.

[36]

Segerstolpe A, Palasantza A, Eliasson P et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab 2016;24:593–607.

[37]

Serra-Cardona A, Zhang Z. Replication-coupled nucleosome assembly in the passage of epigenetic information and cell identity. Trends Biochem Sci 2018;43:136–148.

[38]

Shao X, Yang H, Zhuang X et al. scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network. Nucleic Acids Res 2021;49:e122.

[39]

Svensson V, Natarajan KN, Ly LH et al. Power analysis of single-cell RNA-sequencing experiments. Nat Methods 2017;14:381–387.

[40]

Wada M, Takahashi H, Altaf-Ul-Amin M et al. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes. Gene 2012;503:56–64.

[41]

Wang YJ, Schug J, Won KJ et al. Single-cell transcriptomics of the human endocrine pancreas. Diabetes 2016;65:3028–3038.

[42]

Wang W, Xu Y, Wang L et al. Single-cell profiling identifies mechanisms of inflammatory heterogeneity in chronic rhinosinusitis. Nat Immunol 2022;23:1484–1494.

[43]

Wynn TA. Cellular and molecular mechanisms of fibrosis. J Pathol 2008;214:199–210.

[44]

Xi H, Langerman J, Sabri S et al. A human skeletal muscle atlas identifies the trajectories of stem and progenitor cells across development and from human pluripotent stem cells. Cell Stem Cell 2020;27:181–185.

[45]

Xin Y, Kim J, Okamoto H et al. RNA sequencing of single human islet cells reveals type 2 diabetes genes. Cell Metab 2016;24:608–615.

[46]

Younesi FS, Miller AE, Barker TH et al. Fibroblast and myofibroblast activation in normal tissue repair and fibrosis. Nat Rev Mol Cell Biol 2024;25:617–638.

[47]

Zhang Y, Wallace B. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv 2015;1510:03820.

[48]

Zhang AW, O’Flanagan C, Chavez EA et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat Methods 2019a;16:1007–1015.

[49]

Zhang Z, Luo D, Zhong X et al. SCINA: a semi-supervised subtyping algorithm of single cells and bulk samples. Genes (Basel) 2019b;10:531.

RIGHTS & PERMISSIONS

The Author(s) 2025. Published by Oxford University Press on behalf of Higher Education Press.

AI Summary AI Mindmap
PDF (6903KB)

Supplementary files

Supplementary Information

423

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/