Tuning hyperparameters of doublet-detection methods for single-cell RNA sequencing data

Nan Miles Xi; Angelos Vasilopoulos

doi:10.15302/J-QB-022-0324

PDF(816 KB)

Quant. Biol. ›› 2023, Vol. 11 ›› Issue (3) : 297-305. DOI: 10.15302/J-QB-022-0324

RESEARCH ARTICLE

NGS Data Analysis - RESEARCH ARTICLE

Tuning hyperparameters of doublet-detection methods for single-cell RNA sequencing data

Author information +

History +

Abstract

Background: The existence of doublets in single-cell RNA sequencing (scRNA-seq) data poses a great challenge in downstream data analysis. Computational doublet-detection methods have been developed to remove doublets from scRNA-seq data. Yet, the default hyperparameter settings of those methods may not provide optimal performance.

Methods: We propose a strategy to tune hyperparameters for a cutting-edge doublet-detection method. We utilize a full factorial design to explore the relationship between hyperparameters and detection accuracy on 16 real scRNA-seq datasets. The optimal hyperparameters are obtained by a response surface model and convex optimization.

Results: We show that the optimal hyperparameters provide top performance across scRNA-seq datasets under various biological conditions. Our tuning strategy can be applied to other computational doublet-detection methods. It also offers insights into hyperparameter tuning for broader computational methods in scRNA-seq data analysis.

Conclusions: The hyperparameter configuration significantly impacts the performance of computational doublet-detection methods. Our study is the first attempt to systematically explore the optimal hyperparameters under various biological conditions and optimization objectives. Our study provides much-needed guidance for hyperparameter tuning in computational doublet-detection methods.

Author summary

Doublet is a major confounder in single-cell RNA sequencing data analysis. Computational doublet-detection methods aim to remove doublets from scRNA-seq data. The performance of those methods relies on the appropriate setting of their hyperparameters. In this study, we explore the optimal hyperparameters for scDblFinder, a cutting-edge doublet-detection method. Our optimization utilizes a full factorial design, a response surface model, and 16 real scRNA-seq datasets. The optimal hyperparameters achieve top doublet-detection performance under a wide range of biological conditions. Our methodology is applicable to broader computational methods in scRNA-seq data analysis.

Graphical abstract

Keywords

scRNA-seq / doublet detection / hyperparameter tuning / experimental design / response surface model

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Nan Miles Xi, Angelos Vasilopoulos. Tuning hyperparameters of doublet-detection methods for single-cell RNA sequencing data. Quant. Biol., 2023, 11(3): 297‒305 https://doi.org/10.15302/J-QB-022-0324

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Kolodziejczyk,A. A., Kim,J. K., Svensson,V., Marioni,J. C. Teichmann,S. (2015). The technology and biology of single-cell RNA sequencing. Mol. Cell, 58: 610–620 CrossRef Google scholar

[2]	Saliba,A. Westermann,A. J., Gorski,S. A. (2014). Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res., 42: 8845–8860 CrossRef Google scholar

[3]	Wiedmeier,J. E., Noel,P., Lin,W., Von Hoff,D. D. (2019). Single-cell sequencing in precision medicine. Cancer Treat. Res., 178: 237–252 CrossRef Google scholar

[4]

Aissa,A. F., Islam,A. B. M. M. K., Ariss,M. M., Go,C. C., Rader,A. E., Conrardy,R. D., Gajda,A. M., Rubio-Perez,C., Valyi-Nagy,K., Pasquinelli,M. . (2021). Single-cell transcriptional changes associated with drug tolerance and response to combination therapies in cancer. Nat. Commun., 12: 1628

CrossRef Google scholar

[5]	Sun,G., Li,Z., Rong,D., Zhang,H., Shi,X., Yang,W., Zheng,W., Sun,G., Wu,F., Cao,H. . (2021). Single-cell RNA sequencing in cancer: applications, advances, and emerging challenges. Mol. Ther. Oncolytics, 21: 183–206 CrossRef Google scholar

[6]	Cargill,T. N., Nielsen,C. M., Russell,A. J. C. (2020). The application of single-cell RNA sequencing in vaccinology. J. Immunol. Res., 2020: 8624963 CrossRef Google scholar

[7]	Wolock,S. L., Lopez,R. Klein,A. (2019). Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst., 8: 281–291.e9 CrossRef Google scholar

[8]	McGinnis,C. S., Murrow,L. M. Gartner,Z. (2019). DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst., 8: 329–337.e4 CrossRef Google scholar

[9]	Germain,P. Lun,A., Garcia Meixide,C., Macnair,W. Robinson,M. (2021). Doublet identification in single-cell sequencing data using scDblFinder. F1000 Res., 10: 979 CrossRef Google scholar

[10]	Bais,A. S. (2019). scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics, 15: 1150–1158 CrossRef Google scholar

[11]	Bernstein,N. J., Fong,N. L., Lam,I., Roy,M. A., Hendrickson,D. G. Kelley,D. (2020). Solo: doublet identification in single-cell RNA-seq via semi-supervised deep learning. Cell Syst., 11: 95–101.e5 CrossRef Google scholar

[12]	Xi,N. M. Li,J. (2021). Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Syst., 12: 176–194.e6 CrossRef Google scholar

[13]	Luecken,M. D. Theis,F. (2019). Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol., 15: e8746 CrossRef Google scholar

[14]	Stoeckius,M., Zheng,S., Houck-Loomis,B., Hao,S., Yeung,B. Z., Mauck,W. M. Smibert,P. (2018). Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol., 19: 224 CrossRef Google scholar

[15]	Alles,J., Karaiskos,N., Praktiknjo,S. D., Grosswendt,S., Wahle,P., Ruffault,P. Ayoub,S., Schreyer,L., Boltengagen,A., Birchmeier,C. . (2017). Cell fixation and preservation for droplet-based single-cell transcriptomics. BMC Biol., 15: 44 CrossRef Google scholar

[16]	Kang,H. M., Subramaniam,M., Targ,S., Nguyen,M., Maliskova,L., McCarthy,E., Wan,E., Wong,S., Byrnes,L., Lanata,C. M. . (2018). Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol., 36: 89–94 CrossRef Google scholar

[17]	McGinnis,C. S., Patterson,D. M., Winkler,J., Conrad,D. N., Hein,M. Y., Srivastava,V., Hu,J. L., Murrow,L. M., Weissman,J. S., Werb,Z. . (2019). MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods, 16: 619–626 CrossRef Google scholar

[18]	ProbstP.,Boulesteix A. L.. (2019) Tunability: importance of hyperparameters of machine learning algorithms. arXiv,1802.09596

[19]	Hu,Q. Greene,C. (2019). Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell RNA transcriptomics. Pac. Symp. Biocomput., 24: 362–373

[20]	Raimundo,F., Vallot,C. Vert,J. (2020). Tuning parameters of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol., 21: 212 CrossRef Google scholar

[21]	WangL.,Xiao Q.. (2018) Optimal maximin L₁-distance Latin hypercube designs based on good lattice point designs. Ann Stat., Ann. Statist. 46, 3741–3766

[22]	Wang,L., Sun,F., Lin,D. K. J. Liu,M. (2018). Construction of orthogonal symmetric Latin hypercube designs. Stat. Sin., 28: 1503–1520

[23]	Wang,L., Xu,H. Liu,M. (2022). Fractional factorial designs for Fourier-cosine models. Metrika, 86: 373–390 CrossRef Google scholar

[24]	Wang,L. (2022). A class of multilevel nonregular designs for studying quantitative factors. Stat. Sin., 32: 825–845 CrossRef Google scholar

[25]	RedkoI.,Morvant E.,HabrardA.,SebbanM.. (2019) Advances in Domain Adaptation Theory. Amsterdam: Elsevier

[26]	Xi,N. M. Li,J. (2021). Protocol for executing and benchmarking eight computational doublet-detection methods in single-cell RNA sequencing data analysis. STAR Protoc., 2: 100699 CrossRef Google scholar

[27]	Steinberg,D. M. Hunter,W. (1984). Experimental design: review and comment. Technometrics, 26: 71–97 CrossRef Google scholar

[28]	Hao,Y., Hao,S., Andersen-Nissen,E., Mauck,W. M. Zheng,S., Butler,A., Lee,M. J., Wilk,A. J., Darby,C., Zager,M. . (2021). Integrated analysis of multimodal single-cell data. Cell, 184: 3573–3587.e29 CrossRef Google scholar

[29]	ChenT.. (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794

[30]	BoxG. E. P.Draper N.. (1987) Empirical Model-building and Response Surfaces. New York: John Wiley & Sons

[31]	Box,G. E. P. Wilson,K. (1951). On the experimental attainment of optimum conditions. J. R. Stat. Soc. Series B Stat. Methodol., 13: 1–45

[32]	Breusch,T. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47: 1287–1294 CrossRef Google scholar

Data availability

The 16 scRNA-seq datasets used in this study are available at Zenodo repository (DOI: 4562782)

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/10.15302/J-QB-022-0324.

ACKNOWLEDGEMENTS

We would like to express our sincere gratitude to Dr. Lin Wang at Purdue University Department of Statistics for generously sharing her expert insights and knowledge regarding statistical analysis.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Nan Miles Xi and Angelos Vasilopoulos declare that they have no conflict of interest or financial conflicts to disclose.

This article does not contain any studies with human or animal materials performed by any of the authors

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.