PDF
(816KB)
Abstract
Background: The existence of doublets in single-cell RNA sequencing (scRNA-seq) data poses a great challenge in downstream data analysis. Computational doublet-detection methods have been developed to remove doublets from scRNA-seq data. Yet, the default hyperparameter settings of those methods may not provide optimal performance.
Methods: We propose a strategy to tune hyperparameters for a cutting-edge doublet-detection method. We utilize a full factorial design to explore the relationship between hyperparameters and detection accuracy on 16 real scRNA-seq datasets. The optimal hyperparameters are obtained by a response surface model and convex optimization.
Results: We show that the optimal hyperparameters provide top performance across scRNA-seq datasets under various biological conditions. Our tuning strategy can be applied to other computational doublet-detection methods. It also offers insights into hyperparameter tuning for broader computational methods in scRNA-seq data analysis.
Conclusions: The hyperparameter configuration significantly impacts the performance of computational doublet-detection methods. Our study is the first attempt to systematically explore the optimal hyperparameters under various biological conditions and optimization objectives. Our study provides much-needed guidance for hyperparameter tuning in computational doublet-detection methods.
Graphical abstract
Keywords
scRNA-seq
/
doublet detection
/
hyperparameter tuning
/
experimental design
/
response surface model
Cite this article
Download citation ▾
Nan Miles Xi, Angelos Vasilopoulos.
Tuning hyperparameters of doublet-detection methods for single-cell RNA sequencing data.
Quant. Biol., 2023, 11(3): 297-305 DOI:10.15302/J-QB-022-0324
| [1] |
Kolodziejczyk,A. A., Kim,J. K., Svensson,V., Marioni,J. C. Teichmann,S. (2015). The technology and biology of single-cell RNA sequencing. Mol. Cell, 58: 610–620
|
| [2] |
Saliba,A. Westermann,A. J., Gorski,S. A. (2014). Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res., 42: 8845–8860
|
| [3] |
Wiedmeier,J. E., Noel,P., Lin,W., Von Hoff,D. D. (2019). Single-cell sequencing in precision medicine. Cancer Treat. Res., 178: 237–252
|
| [4] |
Aissa,A. F., Islam,A. B. M. M. K., Ariss,M. M., Go,C. C., Rader,A. E., Conrardy,R. D., Gajda,A. M., Rubio-Perez,C., Valyi-Nagy,K., Pasquinelli,M. . (2021). Single-cell transcriptional changes associated with drug tolerance and response to combination therapies in cancer. Nat. Commun., 12: 1628
|
| [5] |
Sun,G., Li,Z., Rong,D., Zhang,H., Shi,X., Yang,W., Zheng,W., Sun,G., Wu,F., Cao,H. . (2021). Single-cell RNA sequencing in cancer: applications, advances, and emerging challenges. Mol. Ther. Oncolytics, 21: 183–206
|
| [6] |
Cargill,T. N., Nielsen,C. M., Russell,A. J. C. (2020). The application of single-cell RNA sequencing in vaccinology. J. Immunol. Res., 2020: 8624963
|
| [7] |
Wolock,S. L., Lopez,R. Klein,A. (2019). Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst., 8: 281–291.e9
|
| [8] |
McGinnis,C. S., Murrow,L. M. Gartner,Z. (2019). DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst., 8: 329–337.e4
|
| [9] |
Germain,P. Lun,A., Garcia Meixide,C., Macnair,W. Robinson,M. (2021). Doublet identification in single-cell sequencing data using scDblFinder. F1000 Res., 10: 979
|
| [10] |
Bais,A. S. (2019). scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics, 15: 1150–1158
|
| [11] |
Bernstein,N. J., Fong,N. L., Lam,I., Roy,M. A., Hendrickson,D. G. Kelley,D. (2020). Solo: doublet identification in single-cell RNA-seq via semi-supervised deep learning. Cell Syst., 11: 95–101.e5
|
| [12] |
Xi,N. M. Li,J. (2021). Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Syst., 12: 176–194.e6
|
| [13] |
Luecken,M. D. Theis,F. (2019). Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol., 15: e8746
|
| [14] |
Stoeckius,M., Zheng,S., Houck-Loomis,B., Hao,S., Yeung,B. Z., Mauck,W. M. Smibert,P. (2018). Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol., 19: 224
|
| [15] |
Alles,J., Karaiskos,N., Praktiknjo,S. D., Grosswendt,S., Wahle,P., Ruffault,P. Ayoub,S., Schreyer,L., Boltengagen,A., Birchmeier,C. . (2017). Cell fixation and preservation for droplet-based single-cell transcriptomics. BMC Biol., 15: 44
|
| [16] |
Kang,H. M., Subramaniam,M., Targ,S., Nguyen,M., Maliskova,L., McCarthy,E., Wan,E., Wong,S., Byrnes,L., Lanata,C. M. . (2018). Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol., 36: 89–94
|
| [17] |
McGinnis,C. S., Patterson,D. M., Winkler,J., Conrad,D. N., Hein,M. Y., Srivastava,V., Hu,J. L., Murrow,L. M., Weissman,J. S., Werb,Z. . (2019). MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods, 16: 619–626
|
| [18] |
ProbstP.,Boulesteix A. L.. (2019) Tunability: importance of hyperparameters of machine learning algorithms. arXiv,1802.09596
|
| [19] |
Hu,Q. Greene,C. (2019). Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell RNA transcriptomics. Pac. Symp. Biocomput., 24: 362–373
|
| [20] |
Raimundo,F., Vallot,C. Vert,J. (2020). Tuning parameters of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol., 21: 212
|
| [21] |
WangL.,Xiao Q.. (2018) Optimal maximin L1-distance Latin hypercube designs based on good lattice point designs. Ann Stat., Ann. Statist. 46, 3741–3766
|
| [22] |
Wang,L., Sun,F., Lin,D. K. J. Liu,M. (2018). Construction of orthogonal symmetric Latin hypercube designs. Stat. Sin., 28: 1503–1520
|
| [23] |
Wang,L., Xu,H. Liu,M. (2022). Fractional factorial designs for Fourier-cosine models. Metrika, 86: 373–390
|
| [24] |
Wang,L. (2022). A class of multilevel nonregular designs for studying quantitative factors. Stat. Sin., 32: 825–845
|
| [25] |
RedkoI.,Morvant E.,HabrardA.,SebbanM.. (2019) Advances in Domain Adaptation Theory. Amsterdam: Elsevier
|
| [26] |
Xi,N. M. Li,J. (2021). Protocol for executing and benchmarking eight computational doublet-detection methods in single-cell RNA sequencing data analysis. STAR Protoc., 2: 100699
|
| [27] |
Steinberg,D. M. Hunter,W. (1984). Experimental design: review and comment. Technometrics, 26: 71–97
|
| [28] |
Hao,Y., Hao,S., Andersen-Nissen,E., Mauck,W. M. Zheng,S., Butler,A., Lee,M. J., Wilk,A. J., Darby,C., Zager,M. . (2021). Integrated analysis of multimodal single-cell data. Cell, 184: 3573–3587.e29
|
| [29] |
ChenT.. (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794
|
| [30] |
BoxG. E. P.Draper N.. (1987) Empirical Model-building and Response Surfaces. New York: John Wiley & Sons
|
| [31] |
Box,G. E. P. Wilson,K. (1951). On the experimental attainment of optimum conditions. J. R. Stat. Soc. Series B Stat. Methodol., 13: 1–45
|
| [32] |
Breusch,T. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47: 1287–1294
|
RIGHTS & PERMISSIONS
The Author(s). Published by Higher Education Press.