LIBRA: an adaptative integrative tool for paired single-cell multi-omics data
Xabier Martinez-de-Morentin, Sumeer A. Khan, Robert Lehmann, Sisi Qu, Alberto Maillo, Narsis A. Kiani, Felipe Prosper, Jesper Tegner, David Gomez-Cabrero
LIBRA: an adaptative integrative tool for paired single-cell multi-omics data
Background: Single-cell multi-omics technologies allow a profound system-level biology understanding of cells and tissues. However, an integrative and possibly systems-based analysis capturing the different modalities is challenging. In response, bioinformatics and machine learning methodologies are being developed for multi-omics single-cell analysis. It is unclear whether current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter fine-tuning.
Methods: We designed LIBRA, a neural network based framework, to learn translation between paired multi-omics profiles so that a shared latent space is constructed. Additionally, we implemented a variation, aLIBRA, that allows automatic fine-tuning by identifying parameter combinations that optimize both the integrative and predictive tasks. All model parameters and evaluation metrics are made available to users with minimal user iteration. Furthermore, aLIBRA allows experienced users to implement custom configurations. The LIBRA toolbox is freely available as R and Python libraries at GitHub (TranslationalBioinformaticsUnit/LIBRA).
Results: LIBRA was evaluated in eight multi-omic single-cell data-sets, including three combinations of omics. We observed that LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type (clustering) resolution in the integrated latent space. Furthermore, when assessing the predictive power across data modalities, such as predictive chromatin accessibility from gene expression, LIBRA outperforms existing tools. As expected, adaptive parameter optimization (aLIBRA) significantly boosted the performance of learning predictive models from paired data-sets.
Conclusion: LIBRA is a versatile tool that performs competitively in both “integration” and “prediction” tasks based on single-cell multi-omics data. LIBRA is a data-driven robust platform that includes an adaptive learning scheme.
There is a need for tools that integrate single-cell multi-omic data while addressing several integrative challenges simultaneously. To this end, we designed a deep-learning based tool LIBRA that performs competitively in both “integration” and “prediction” tasks based on single-cell multi-omics data. Furthermore, when assessing the predictive power across data modalities, LIBRA outperforms existing tools. LIBRA and its adaptive scheme aLIBRA, allow automatic fine-tuning for users with limited effort. Additionally, aLIBRA allows experienced users to implement custom configurations. The LIBRA toolbox is freely available as R and Python libraries.
single-cell / multi-omic / Autoencoder / auto-finetuning
[1] |
Chen, S., Lake, B. B. (2019). High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol., 37: 1452–1457
CrossRef
Google scholar
|
[2] |
Cao, J., Cusanovich, D. Ramani, V., Aghamirzaie, D., Pliner, H. Hill, A. Daza, R. McFaline-Figueroa, J. Packer, J. Christiansen, L.
CrossRef
Google scholar
|
[3] |
Ma, S., Zhang, B., LaFave, L. M., Earl, A. S., Chiang, Z., Hu, Y., Ding, J., Brack, A., Kartha, V. K., Tay, T.
CrossRef
Google scholar
|
[4] |
Zhu, C., Yu, M., Huang, H., Juric, I., Abnousi, A., Hu, R., Lucero, J., Behrens, M. M., Hu, M. (2019). An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol., 26: 1063–1070
CrossRef
Google scholar
|
[5] |
Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W. M. Hao, Y., Stoeckius, M., Smibert, P. (2019). Comprehensive integration of single-cell data. Cell, 177: 1888–1902.e21
CrossRef
Google scholar
|
[6] |
Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P. K., Swerdlow, H., Satija, R. (2017). Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods, 14: 865–868
CrossRef
Google scholar
|
[7] |
Clark, S. J., Argelaguet, R., Kapourani, C. A., Stubbs, T. M., Lee, H. J., Alda-Catalinas, C., Krueger, F., Sanguinetti, G., Kelsey, G., Marioni, J. C.
CrossRef
Google scholar
|
[8] |
Argelaguet, R., Cuomo, A. S. E., Stegle, O. Marioni, J. (2021). Computational principles and challenges in single-cell data integration. Nat. Biotechnol., 39: 1202–1215
CrossRef
Google scholar
|
[9] |
Rohart, F., Gautier, B., Singh, A. Cao, K. (2017). mixOmics: An R package for ‘omics feature selection and multiple data integration. PLOS Comput. Biol., 13: e1005752
CrossRef
Google scholar
|
[10] |
Argelaguet, R., Velten, B., Arnol, D., Dietrich, S., Zenz, T., Marioni, J. C., Buettner, F., Huber, W. (2018). Multi-omics factor analysis−a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol., 14: e8124
CrossRef
Google scholar
|
[11] |
Lock, E. F., Hoadley, K. A., Marron, J. S. Nobel, A. (2013). Joint and individual variation explained (Jive) for integrated analysis of multiple data types. Ann. Appl. Stat., 7: 523–542
CrossRef
Google scholar
|
[12] |
Teschendorff, A. E., Jing, H., Paul, D. S., Virta, J. (2018). Tensorial blind source separation for improved analysis of multi-omic data. Genome Biol., 19: 76
CrossRef
Google scholar
|
[13] |
Gomez-Cabrero, D., Tarazona, S., s-Vidal, I., Ramirez, R. N., Company, C., Schmidt, A., Reijmers, T., Paul, V. V. S., Marabita, F., guez-Ubreva, J.
CrossRef
Google scholar
|
[14] |
Stegle, O., Teichmann, S. A. Marioni, J. (2015). Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16: 133–145
CrossRef
Google scholar
|
[15] |
Perkel, J. (2021). Single-cell analysis enters the multiomics age. Nature, 595: 614–616
CrossRef
Google scholar
|
[16] |
Marx, V. (2022). How single-cell multi-omics builds relationships. Nat. Methods, 19: 142–146
CrossRef
Google scholar
|
[17] |
Argelaguet, R., Arnol, D., Bredikhin, D., Deloro, Y., Velten, B., Marioni, J. C. (2020). MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol., 21: 111
CrossRef
Google scholar
|
[18] |
Hao, Y., Hao, S., Andersen-Nissen, E., Mauck, W. M. Zheng, S., Butler, A., Lee, M. J., Wilk, A. J., Darby, C., Zager, M.
CrossRef
Google scholar
|
[19] |
Wu, K. E., Yost, K. E., Chang, H. Y. (2021). BABEL enables cross-modality translation between multiomic profiles at single-cell resolution. Proc. Natl. Acad. Sci. USA, 118: e2023070118
CrossRef
Google scholar
|
[20] |
Fortelny, N. (2020). Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol., 21: 190
CrossRef
Google scholar
|
[21] |
RavindraN.,SehanobishA.,PappalardoJ. L.,HaflerD. A.. “Disease state prediction from single-cell data using graph attention networks,” ACM CHIL 2020 - Proc. 2020 ACM Conf. Heal. Inference, Learn., pp. 121–130, 2020
|
[22] |
Kimmel, J. C. Kelley, D. (2021). Semisupervised adversarial neural networks for single-cell classification. Genome Res., 31: 1781–1793
CrossRef
Google scholar
|
[23] |
Sargent, B., Jafari, M., Marquez, G., Mehta, A. S., Sun, Y. H., Yang, H. Y., Zhu, K., Isseroff, R. R., Zhao, M. (2022). A machine learning based model accurately predicts cellular response to electric fields in multiple cell types. Sci. Rep., 12: 9912
CrossRef
Google scholar
|
[24] |
Yang, F., Wang, W., Wang, F., Fang, Y., Tang, D., Huang, J., Lu, H. (2022). scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell., 4: 852–866
CrossRef
Google scholar
|
[25] |
cken, M. D., Burkhardt, D., Cannoodt, R., Lance, C., Agrawal, A., Aliee, H., Chen, A., Deconinck, L., Detweiler, A.,
|
[26] |
Lockett, A. (2020). No free lunch theorems. Nat. Comput. Ser., 1: 287–322
CrossRef
Google scholar
|
[27] |
Cho, K., nboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),
|
[28] |
Pedregosa, F., Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., (2011). Scikit-learn: machine learning in Python. J. Mach. Learn. Res., 12: 2825–2830
|
[29] |
Xu, B., Wang, N., Chen, T. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv,
|
[30] |
Gayoso, A., Steier, Z., Lopez, R., Regier, J., Nazor, K. L., Streets, A. (2021). Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods, 18: 272–282
CrossRef
Google scholar
|
[31] |
LotfollahiM.,LitinetskayaA.TheisF.. (2022) Multigrate: single-cell multi-omic data integration. bioRxiv, 2022.03.16.484643
|
[32] |
AshuachT.,GabittoM. I.,JordanM. I.. (2021) MultiVI: deep generative model for the integration of multi-modal data. bioRxiv, 2021.08.20.457057
|
[33] |
Luecken, M. D., ttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M.,
CrossRef
Google scholar
|
[34] |
Mimitou, E. P., Lareau, C. A., Chen, K. Y., Zorzetto-Fernandes, A. L., Hao, Y., Takeshima, Y., Luo, W., Huang, T. S., Yeung, B. Z., Papalexi, E.
CrossRef
Google scholar
|
[35] |
Rood, J. E., Maartens, A., Hupalowska, A., Teichmann, S. A. (2022). Impact of the Human Cell Atlas on medicine. Nat. Med., 28: 2486–2496
CrossRef
Google scholar
|
NN | Neural networks |
GEO | Gene Expression Omnibus |
SLS | Shared latent space |
PJI | Pairwise Jaccard Index |
DS | Data set |
predRNA | Predicted RNA |
predATAC | Predicted ATAC |
MSE | Mean squared error |
SNARE-seq | Droplet based technology to profile chromatin accessibility and gene expression from the same cells |
CITE-seq | Qualitative information over gene expression and surface proteins with available antibodies on a single cell level |
Paired-seq | Combinatorial indexing strategy to simultaneously tag both the open chromatin fragments generated by the Tn5 transposases and the cDNA molecules generated from reverse transcription |
SHARE-seq | Strategy that uses three rounds of barcodes by ligating barcoded adaptors to both RNA (gene expression) and tagmented DNA (chromatin accessibility) to achieve the multi-omic profiling from the same single cells |
10X | 10X Genomics single-cell multiomics solutions |
CITE-seq | Method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level |
scNMT-seq | Method to look at methylation (CpG) and chromatin accessibility (GpC) |
/
〈 | 〉 |