cKBET: assessing goodness of batch effect correction for single-cell RNA-seq

Yameng ZHAO , Yin GUO , Limin LI

Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (1) : 181901

PDF (39074KB)
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (1) : 181901 DOI: 10.1007/s11704-022-2111-8
Interdisciplinary
RESEARCH ARTICLE

cKBET: assessing goodness of batch effect correction for single-cell RNA-seq

Author information +
History +
PDF (39074KB)

Abstract

Single-cell RNA sequencing reveals the gene structure and gene expression status of a single cell, which can reflect the heterogeneity between cells. However, batch effects caused by non-biological factors may hinder data integration and downstream analysis. Although the batch effect can be evaluated by visualizing the data, which actually is subjective and inaccurate. In this work, we propose a quantitative method cKBET, which considers the batch and cell type information simultaneously. The cKBET method accesses batch effects by comparing the global and local fraction of cells of different batches in different cell types. We verify the performance of our cKBET method on simulated and real biological data sets. The experimental results show that our cKBET method is superior to existing methods in most cases. In general, our cKBET method can detect batch effect with either balanced or unbalanced cell types, and thus evaluate batch correction methods.

Graphical abstract

Keywords

single-cell RNA-seq dataset / batch effect assessment / cKBET method

Cite this article

Download citation ▾
Yameng ZHAO, Yin GUO, Limin LI. cKBET: assessing goodness of batch effect correction for single-cell RNA-seq. Front. Comput. Sci., 2024, 18(1): 181901 DOI:10.1007/s11704-022-2111-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Hashimshony T, Wagner F, Sher N, Yanai I . CEL-seq: single-cell RNA-seq by multiplexed linear amplification. Cell Reports, 2012, 2( 3): 666–673

[2]

Picelli S, Björklund Å K, Faridani O R, Sagasser S, Winberg G, Sandberg R . Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nature Methods, 2013, 10( 11): 1096–1098

[3]

Macosko E Z, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas A R, Kamitaki N, Martersteck E M, Trombetta J J, Weitz D A, Sanes J R, Shalek A K, Regev A, McCarroll S A . Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 2015, 161( 5): 1202–1214

[4]

Klein A M, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz D A, Kirschner M W . Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 2015, 161( 5): 1187–1201

[5]

Cao J, Packer J S, Ramani V, Cusanovich D A, Huynh C, Daza R, Qiu X, Lee C, Furlan S N, Steemers F J, Adey A, Waterston R H, Trapnell C, Shendure J . Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 2017, 357( 6352): 661–667

[6]

Zheng G X Y, Terry J M, Belgrader P, Ryvkin P, Bent Z W, Wilson R, Ziraldo S B, Wheeler T D, McDermott G P, Zhu J, Gregory M T, Shuga J, Montesclaros L, Underwood J G, Masquelier D A, Nishimura S Y, Schnall-Levin M, Wyatt P W, Hindson C M, Bharadwaj R, Wong A, Ness K D, Beppu L W, Deeg H J, McFarland C, Loeb K R, Valente W J, Ericson N G, Stevens E A, Radich J P, Mikkelsen T S, Hindson B J, Bielas J H . Massively parallel digital transcriptional profiling of single cells. Nature Communications, 2017, 8: 14049

[7]

Zhang X, Marjani S L, Hu Z, Weissman S M, Pan X, Wu S . Single-cell sequencing for precise cancer research: progress and prospects. Cancer Research, 2016, 76( 6): 1305–1312

[8]

Chen H, Ye F, Guo G . Revolutionizing immunology with single-cell RNA sequencing. Cellular & Molecular Immunology, 2019, 16( 3): 242–249

[9]

Hicks S C, Townes F W, Teng M, Irizarry R A . Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics, 2018, 19( 4): 562–578

[10]

Tung P Y, Blischak J D, Hsiao C J, Knowles D A, Burnett J E, Pritchard J K, Gilad Y . Batch effects and the effective design of single-cell gene expression studies. Scientific Reports, 2017, 7: 39921

[11]

Johnson W E, Li C, Rabinovic A . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 2007, 8( 1): 118–127

[12]

Ritchie M E, Phipson B, Wu D, Hu Y, Law C W, Shi W, Smyth G K . limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 2015, 43( 7): e47

[13]

Risso D, Ngai J, Speed T P, Dudoit S . Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014, 32( 9): 896–902

[14]

Leek J T . Svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Research, 2014, 42( 21): e161

[15]

Haghverdi L, Lun A T L, Morgan M D, Marioni J C . Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology, 2018, 36( 5): 421–427

[16]

Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh P R, Raychaudhuri S . Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods, 2019, 16( 12): 1289–1296

[17]

Aliverti E, Tilson J L, Filer D L, Babcock B, Colaneri A, Ocasio J, Gershon T R, Wilhelmsen K C, Dunson D B . Projected t-SNE for batch correction. Bioinformatics, 2020, 36( 11): 3522–3527

[18]

Li X, Wang K, Lyu Y, Pan H, Zhang J, Stambolian D, Susztak K, Reilly M P, Hu G, Li M . Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nature Communications, 2020, 11( 1): 2338

[19]

Wang T, Johnson T S, Shao W, Lu Z, Helm B R, Zhang J, Huang K . BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes. Genome Biology, 2019, 20( 1): 165

[20]

Shaham U, Stanton K P, Zhao J, Li H, Raddassi K, Montgomery R, Kluger Y . Removal of batch effects using distribution-matching residual networks. Bioinformatics, 2017, 33( 16): 2539–2546

[21]

Büttner M, Miao Z, Wolf F A, Teichmann S A, Theis F J . A test metric for assessing single-cell RNA-seq batch correction. Nature Methods, 2019, 16( 1): 43–49

[22]

Pearson K . LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901, 2( 11): 559–572

[23]

Van der Maaten L, Hinton G . Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9( 86): 2579–2605

[24]

Rousseeuw P J . Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987, 20: 53–65

[25]

Massy W F . Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 1965, 60( 309): 234–256

[26]

McCarthy D J, Campbell K R, Lun A T L, Wills Q F . Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics, 2017, 33( 8): 1179–1186

[27]

Kolodziejczyk A A, Kim J K, Tsang J C H, Ilicic T, Henriksson J, Natarajan K N, Tuck A C, Gao X, Bühler M, Liu P, Marioni J C, Teichmann S A . Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell, 2015, 17( 4): 471–485

[28]

The Tabula Muris Consortium . Single-cell transcriptomics of 20 mouse organs creates a Tabula muris. Nature, 2018, 562( 7727): 367–372

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (39074KB)

Supplementary files

FCS-22111-OF-YZ_suppl_1

2145

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/