cKBET: assessing goodness of batch effect correction for single-cell RNA-seq
Yameng ZHAO, Yin GUO, Limin LI
cKBET: assessing goodness of batch effect correction for single-cell RNA-seq
Single-cell RNA sequencing reveals the gene structure and gene expression status of a single cell, which can reflect the heterogeneity between cells. However, batch effects caused by non-biological factors may hinder data integration and downstream analysis. Although the batch effect can be evaluated by visualizing the data, which actually is subjective and inaccurate. In this work, we propose a quantitative method cKBET, which considers the batch and cell type information simultaneously. The cKBET method accesses batch effects by comparing the global and local fraction of cells of different batches in different cell types. We verify the performance of our cKBET method on simulated and real biological data sets. The experimental results show that our cKBET method is superior to existing methods in most cases. In general, our cKBET method can detect batch effect with either balanced or unbalanced cell types, and thus evaluate batch correction methods.
single-cell RNA-seq dataset / batch effect assessment / cKBET method
Yameng Zhao obtained her Bachelor degree from Hunan University, China in 2020. She is currently a doctoral candidate at School of Mathematics and Statistics in Xi’an Jiaotong University, China. Her research interest is the applications in bioinformatics
Yin Guo obtained her Bachelor degree from Minzu University of China, China in 2018. She is currently a doctoral candidate at School of Mathematics and Statistics in Xi’an Jiaotong University, China. Her research interest is the applications in bioinformatics
Limin Li obtained her Bachelor and Master degrees from Zhejiang University, China in 2004 and 2006, respectively. She got her PhD degree in mathematics at the University of Hong Kong, China in 2010. She then worked as a postdoctoral fellow in Max Planck Institute of Intelligent System. She is currently a professor at School of Mathematics and Statistics in Xi'an Jiaotong University, China. Her research interests include machine learning and the applications in bioinformatics
[1] |
Hashimshony T, Wagner F, Sher N, Yanai I . CEL-seq: single-cell RNA-seq by multiplexed linear amplification. Cell Reports, 2012, 2( 3): 666–673
|
[2] |
Picelli S, Björklund Å K, Faridani O R, Sagasser S, Winberg G, Sandberg R . Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nature Methods, 2013, 10( 11): 1096–1098
|
[3] |
Macosko E Z, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas A R, Kamitaki N, Martersteck E M, Trombetta J J, Weitz D A, Sanes J R, Shalek A K, Regev A, McCarroll S A . Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 2015, 161( 5): 1202–1214
|
[4] |
Klein A M, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz D A, Kirschner M W . Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 2015, 161( 5): 1187–1201
|
[5] |
Cao J, Packer J S, Ramani V, Cusanovich D A, Huynh C, Daza R, Qiu X, Lee C, Furlan S N, Steemers F J, Adey A, Waterston R H, Trapnell C, Shendure J . Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 2017, 357( 6352): 661–667
|
[6] |
Zheng G X Y, Terry J M, Belgrader P, Ryvkin P, Bent Z W, Wilson R, Ziraldo S B, Wheeler T D, McDermott G P, Zhu J, Gregory M T, Shuga J, Montesclaros L, Underwood J G, Masquelier D A, Nishimura S Y, Schnall-Levin M, Wyatt P W, Hindson C M, Bharadwaj R, Wong A, Ness K D, Beppu L W, Deeg H J, McFarland C, Loeb K R, Valente W J, Ericson N G, Stevens E A, Radich J P, Mikkelsen T S, Hindson B J, Bielas J H . Massively parallel digital transcriptional profiling of single cells. Nature Communications, 2017, 8: 14049
|
[7] |
Zhang X, Marjani S L, Hu Z, Weissman S M, Pan X, Wu S . Single-cell sequencing for precise cancer research: progress and prospects. Cancer Research, 2016, 76( 6): 1305–1312
|
[8] |
Chen H, Ye F, Guo G . Revolutionizing immunology with single-cell RNA sequencing. Cellular & Molecular Immunology, 2019, 16( 3): 242–249
|
[9] |
Hicks S C, Townes F W, Teng M, Irizarry R A . Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics, 2018, 19( 4): 562–578
|
[10] |
Tung P Y, Blischak J D, Hsiao C J, Knowles D A, Burnett J E, Pritchard J K, Gilad Y . Batch effects and the effective design of single-cell gene expression studies. Scientific Reports, 2017, 7: 39921
|
[11] |
Johnson W E, Li C, Rabinovic A . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 2007, 8( 1): 118–127
|
[12] |
Ritchie M E, Phipson B, Wu D, Hu Y, Law C W, Shi W, Smyth G K . limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 2015, 43( 7): e47
|
[13] |
Risso D, Ngai J, Speed T P, Dudoit S . Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014, 32( 9): 896–902
|
[14] |
Leek J T . Svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Research, 2014, 42( 21): e161
|
[15] |
Haghverdi L, Lun A T L, Morgan M D, Marioni J C . Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology, 2018, 36( 5): 421–427
|
[16] |
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh P R, Raychaudhuri S . Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods, 2019, 16( 12): 1289–1296
|
[17] |
Aliverti E, Tilson J L, Filer D L, Babcock B, Colaneri A, Ocasio J, Gershon T R, Wilhelmsen K C, Dunson D B . Projected t-SNE for batch correction. Bioinformatics, 2020, 36( 11): 3522–3527
|
[18] |
Li X, Wang K, Lyu Y, Pan H, Zhang J, Stambolian D, Susztak K, Reilly M P, Hu G, Li M . Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nature Communications, 2020, 11( 1): 2338
|
[19] |
Wang T, Johnson T S, Shao W, Lu Z, Helm B R, Zhang J, Huang K . BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes. Genome Biology, 2019, 20( 1): 165
|
[20] |
Shaham U, Stanton K P, Zhao J, Li H, Raddassi K, Montgomery R, Kluger Y . Removal of batch effects using distribution-matching residual networks. Bioinformatics, 2017, 33( 16): 2539–2546
|
[21] |
Büttner M, Miao Z, Wolf F A, Teichmann S A, Theis F J . A test metric for assessing single-cell RNA-seq batch correction. Nature Methods, 2019, 16( 1): 43–49
|
[22] |
Pearson K . LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901, 2( 11): 559–572
|
[23] |
Van der Maaten L, Hinton G . Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9( 86): 2579–2605
|
[24] |
Rousseeuw P J . Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987, 20: 53–65
|
[25] |
Massy W F . Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 1965, 60( 309): 234–256
|
[26] |
McCarthy D J, Campbell K R, Lun A T L, Wills Q F . Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics, 2017, 33( 8): 1179–1186
|
[27] |
Kolodziejczyk A A, Kim J K, Tsang J C H, Ilicic T, Henriksson J, Natarajan K N, Tuck A C, Gao X, Bühler M, Liu P, Marioni J C, Teichmann S A . Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell, 2015, 17( 4): 471–485
|
[28] |
The Tabula Muris Consortium . Single-cell transcriptomics of 20 mouse organs creates a Tabula muris. Nature, 2018, 562( 7727): 367–372
|
/
〈 | 〉 |