Imputation of single-cell gene expression with an autoencoder neural network

Md. Bahadur Badsha , Rui Li , Boxiang Liu , Yang I. Li , Min Xian , Nicholas E. Banovich , Audrey Qiuyan Fu

Quant. Biol. ›› 2020, Vol. 8 ›› Issue (1) : 78 -94.

PDF (3905KB)
Quant. Biol. ›› 2020, Vol. 8 ›› Issue (1) : 78 -94. DOI: 10.1007/s40484-019-0192-7
RESEARCH ARTICLE
RESEARCH ARTICLE

Imputation of single-cell gene expression with an autoencoder neural network

Author information +
History +
PDF (3905KB)

Abstract

Background: Single-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses.

Methods: To overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates.

Results: On both simulated and real data, LATE and TRANSLATE outperform existing scRNA-seq imputation methods, achieving lower mean squared error in most cases, recovering nonlinear gene-gene relationships, and better separating cell types. They are also highly scalable and can efficiently process over 1 million cells in just a few hours on a GPU.

Conclusions: We demonstrate that our nonparametric approach to imputation based on autoencoders is powerful and highly efficient.

Graphical abstract

Keywords

single-cell / gene expression / deep learning / autoencoder

Cite this article

Download citation ▾
Md. Bahadur Badsha, Rui Li, Boxiang Liu, Yang I. Li, Min Xian, Nicholas E. Banovich, Audrey Qiuyan Fu. Imputation of single-cell gene expression with an autoencoder neural network. Quant. Biol., 2020, 8(1): 78-94 DOI:10.1007/s40484-019-0192-7

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. and Teichmann, S. A. (2015) The technology and biology of single-cell RNA sequencing. Mol. Cell, 58, 610–620

[2]

Ziegenhain, C., Vieth, B., Parekh, S., Reinius, B., Guillaumet-Adkins, A., Smets, M., Leonhardt, H., Heyn, H., Hellmann, I. and Enard, W. (2017) Comparative analysis of single-cell RNA sequencing methods. Mol. Cell, 65, 631–643.e4

[3]

Li, W. V. and Li, J. J. (2018) An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat. Commun., 9, 997

[4]

Huang, M., Wang, J., Torre, E., Dueck, H., Shaffer, S., Bonasio, R., Murray, J. I., Raj, A., Li, M. and Zhang, N. R. (2018) SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods, 15, 539–542

[5]

van Dijk, D., Sharma, R., Nainys, J., Yim, K., Kathail, P., Carr, A. J., Burdziak, C., Moon, K. R., Chaffer, C. L., Pattabiraman, D., (2018) Recovering gene interactions from single-cell data using data diffusion. Cell, 174, 716–729.e27

[6]

Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. and Theis, F. J. (2019) Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun., 10, 390

[7]

Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. and Yosef, N. (2018) Deep generative modeling for single-cell transcriptomics. Nat. Methods, 15, 1053–1058

[8]

Bacher, R. and Kendziorski, C. (2016) Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol., 17, 63

[9]

Stegle, O., Teichmann, S. A. and Marioni, J. C. (2015) Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16, 133–145

[10]

Hinton, G. E. and Salakhutdinov, R. R. (2006) Reducing the dimensionality of data with neural networks. Science, 313, 504–507

[11]

Bengio, Y. (2012) Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning. pp. 17–36. Bellevue

[12]

Zhu, Z., Wang, X., Bai, S., Yao C. and Bai, X. (2016) Deep learning representation using autoencoder for 3D shape retrieval. Neurocomputing, 204, 41–50

[13]

Rumelhart, D. E., Hinton,G. E. and Williams, R. J. (1986) Learning representations by back-propagating errors. Nature, 323, 533–536

[14]

Kingma, D. P. and Ba, J. (2015) Adam: A method for stochastic optimization. In: Proceeding of the 3rd International Conference for Learning Representations. San Diego

[15]

Dahl, G. E., Sainath, T. N. and Hinton, G. E. (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proceedings of IEEE international conference on acoustics, speech and signal processing, pp. 8609–8613. IEEE Service Center

[16]

Goodfellow, I., Bengio, Y. and Courville, A. (2016) Deep Learing. Cambridge: MIT Press

[17]

Linderman, G. C., Zhao, J. and Kluger, Y. (2018) Zero-preserving imputation of scRNA-seq data using low-rank approximation. bioRxiv: 397588

[18]

Zappia, L., Phipson, B. and Oshlack, A. (2017) Splatter: simulation of single-cell RNA sequencing data. Genome Biol., 18, 174

[19]

Shekhar, K., Lapan, S. W., Whitney, I. E., Tran, N. M., Macosko, E. Z., Kowalczyk, M., Adiconis, X., Levin, J. Z., Nemesh, J., Goldman, M., (2016) Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell, 166, 1308–1323.e30

[20]

Johnson, W. E., Li, C. and Rabinovic, A. (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8, 118–127

[21]

Zhu, Z., Wang, T. and Samworth, R. J. (2019) High-dimensional principal component analysis with heterogeneous missingness. arXiv:1906.12125

[22]

Paul, F., Arkin, Y., Giladi, A., Jaitin, D. A., Kenigsberg, E., Keren-Shaul, H., Winter, D., Lara-Astiaso, D., Gury, M., Weiner, A., (2015) Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell, 163, 1663–1677

[23]

Zheng, G. X. Y., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W., Wilson, R., Ziraldo, S. B., Wheeler, T. D., McDermott, G. P., Zhu,J., (2017) Massively parallel digital transcriptional profiling of single cells. Nat. Commun., 8, 14049

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (3905KB)

Supplementary files

Supplementary Material 1

Supplementary Material 2

3876

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/