LIBRA: an adaptative integrative tool for paired single-cell multi-omics data

Xabier Martinez-de-Morentin, Sumeer A. Khan, Robert Lehmann, Sisi Qu, Alberto Maillo, Narsis A. Kiani, Felipe Prosper, Jesper Tegner, David Gomez-Cabrero

PDF(7940 KB)
PDF(7940 KB)
Quant. Biol. ›› 2023, Vol. 11 ›› Issue (3) : 246-259. DOI: 10.15302/J-QB-022-0318
RESEARCH ARTICLE

LIBRA: an adaptative integrative tool for paired single-cell multi-omics data

Author information +
History +

Abstract

Background: Single-cell multi-omics technologies allow a profound system-level biology understanding of cells and tissues. However, an integrative and possibly systems-based analysis capturing the different modalities is challenging. In response, bioinformatics and machine learning methodologies are being developed for multi-omics single-cell analysis. It is unclear whether current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter fine-tuning.

Methods: We designed LIBRA, a neural network based framework, to learn translation between paired multi-omics profiles so that a shared latent space is constructed. Additionally, we implemented a variation, aLIBRA, that allows automatic fine-tuning by identifying parameter combinations that optimize both the integrative and predictive tasks. All model parameters and evaluation metrics are made available to users with minimal user iteration. Furthermore, aLIBRA allows experienced users to implement custom configurations. The LIBRA toolbox is freely available as R and Python libraries at GitHub (TranslationalBioinformaticsUnit/LIBRA).

Results: LIBRA was evaluated in eight multi-omic single-cell data-sets, including three combinations of omics. We observed that LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type (clustering) resolution in the integrated latent space. Furthermore, when assessing the predictive power across data modalities, such as predictive chromatin accessibility from gene expression, LIBRA outperforms existing tools. As expected, adaptive parameter optimization (aLIBRA) significantly boosted the performance of learning predictive models from paired data-sets.

Conclusion: LIBRA is a versatile tool that performs competitively in both “integration” and “prediction” tasks based on single-cell multi-omics data. LIBRA is a data-driven robust platform that includes an adaptive learning scheme.

Author summary

There is a need for tools that integrate single-cell multi-omic data while addressing several integrative challenges simultaneously. To this end, we designed a deep-learning based tool LIBRA that performs competitively in both “integration” and “prediction” tasks based on single-cell multi-omics data. Furthermore, when assessing the predictive power across data modalities, LIBRA outperforms existing tools. LIBRA and its adaptive scheme aLIBRA, allow automatic fine-tuning for users with limited effort. Additionally, aLIBRA allows experienced users to implement custom configurations. The LIBRA toolbox is freely available as R and Python libraries.

Graphical abstract

Keywords

single-cell / multi-omic / Autoencoder / auto-finetuning

Cite this article

Download citation ▾
Xabier Martinez-de-Morentin, Sumeer A. Khan, Robert Lehmann, Sisi Qu, Alberto Maillo, Narsis A. Kiani, Felipe Prosper, Jesper Tegner, David Gomez-Cabrero. LIBRA: an adaptative integrative tool for paired single-cell multi-omics data. Quant. Biol., 2023, 11(3): 246‒259 https://doi.org/10.15302/J-QB-022-0318

References

[1]
Chen, S., Lake, B. B. (2019). High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol., 37: 1452–1457
CrossRef Google scholar
[2]
Cao, J., Cusanovich, D. Ramani, V., Aghamirzaie, D., Pliner, H. Hill, A. Daza, R. McFaline-Figueroa, J. Packer, J. Christiansen, L. . (2018). Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science, 361: 1380–1385
CrossRef Google scholar
[3]
Ma, S., Zhang, B., LaFave, L. M., Earl, A. S., Chiang, Z., Hu, Y., Ding, J., Brack, A., Kartha, V. K., Tay, T. . (2020). Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell, 183: 1103–1116.e20
CrossRef Google scholar
[4]
Zhu, C., Yu, M., Huang, H., Juric, I., Abnousi, A., Hu, R., Lucero, J., Behrens, M. M., Hu, M. (2019). An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol., 26: 1063–1070
CrossRef Google scholar
[5]
Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W. M. Hao, Y., Stoeckius, M., Smibert, P. (2019). Comprehensive integration of single-cell data. Cell, 177: 1888–1902.e21
CrossRef Google scholar
[6]
Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P. K., Swerdlow, H., Satija, R. (2017). Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods, 14: 865–868
CrossRef Google scholar
[7]
Clark, S. J., Argelaguet, R., Kapourani, C. A., Stubbs, T. M., Lee, H. J., Alda-Catalinas, C., Krueger, F., Sanguinetti, G., Kelsey, G., Marioni, J. C. . (2018). scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun., 9: 781
CrossRef Google scholar
[8]
Argelaguet, R., Cuomo, A. S. E., Stegle, O. Marioni, J. (2021). Computational principles and challenges in single-cell data integration. Nat. Biotechnol., 39: 1202–1215
CrossRef Google scholar
[9]
Rohart, F., Gautier, B., Singh, A. Cao, K. (2017). mixOmics: An R package for ‘omics feature selection and multiple data integration. PLOS Comput. Biol., 13: e1005752
CrossRef Google scholar
[10]
Argelaguet, R., Velten, B., Arnol, D., Dietrich, S., Zenz, T., Marioni, J. C., Buettner, F., Huber, W. (2018). Multi-omics factor analysis−a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol., 14: e8124
CrossRef Google scholar
[11]
Lock, E. F., Hoadley, K. A., Marron, J. S. Nobel, A. (2013). Joint and individual variation explained (Jive) for integrated analysis of multiple data types. Ann. Appl. Stat., 7: 523–542
CrossRef Google scholar
[12]
Teschendorff, A. E., Jing, H., Paul, D. S., Virta, J. (2018). Tensorial blind source separation for improved analysis of multi-omic data. Genome Biol., 19: 76
CrossRef Google scholar
[13]
Gomez-Cabrero, D., Tarazona, S., s-Vidal, I., Ramirez, R. N., Company, C., Schmidt, A., Reijmers, T., Paul, V. V. S., Marabita, F., guez-Ubreva, J. . (2019). STATegra, a comprehensive multi-omics dataset of B-cell differentiation in mouse. Sci. Data, 6: 256
CrossRef Google scholar
[14]
Stegle, O., Teichmann, S. A. Marioni, J. (2015). Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16: 133–145
CrossRef Google scholar
[15]
Perkel, J. (2021). Single-cell analysis enters the multiomics age. Nature, 595: 614–616
CrossRef Google scholar
[16]
Marx, V. (2022). How single-cell multi-omics builds relationships. Nat. Methods, 19: 142–146
CrossRef Google scholar
[17]
Argelaguet, R., Arnol, D., Bredikhin, D., Deloro, Y., Velten, B., Marioni, J. C. (2020). MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol., 21: 111
CrossRef Google scholar
[18]
Hao, Y., Hao, S., Andersen-Nissen, E., Mauck, W. M. Zheng, S., Butler, A., Lee, M. J., Wilk, A. J., Darby, C., Zager, M. . (2021). Integrated analysis of multimodal single-cell data. Cell, 184: 3573–3587.e29
CrossRef Google scholar
[19]
Wu, K. E., Yost, K. E., Chang, H. Y. (2021). BABEL enables cross-modality translation between multiomic profiles at single-cell resolution. Proc. Natl. Acad. Sci. USA, 118: e2023070118
CrossRef Google scholar
[20]
Fortelny, N. (2020). Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol., 21: 190
CrossRef Google scholar
[21]
RavindraN.,SehanobishA.,PappalardoJ. L.,HaflerD. A.. “Disease state prediction from single-cell data using graph attention networks,” ACM CHIL 2020 - Proc. 2020 ACM Conf. Heal. Inference, Learn., pp. 121–130, 2020
[22]
Kimmel, J. C. Kelley, D. (2021). Semisupervised adversarial neural networks for single-cell classification. Genome Res., 31: 1781–1793
CrossRef Google scholar
[23]
Sargent, B., Jafari, M., Marquez, G., Mehta, A. S., Sun, Y. H., Yang, H. Y., Zhu, K., Isseroff, R. R., Zhao, M. (2022). A machine learning based model accurately predicts cellular response to electric fields in multiple cell types. Sci. Rep., 12: 9912
CrossRef Google scholar
[24]
Yang, F., Wang, W., Wang, F., Fang, Y., Tang, D., Huang, J., Lu, H. (2022). scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell., 4: 852–866
CrossRef Google scholar
[25]
cken, M. D., Burkhardt, D., Cannoodt, R., Lance, C., Agrawal, A., Aliee, H., Chen, A., Deconinck, L., Detweiler, A., . (2021). A sandbox for prediction and integration of DNA, RNA, and protein data in single cells. In: NeurIPS 2021 Track Datasets Benchmarks, pp. 1–13
[26]
Lockett, A. (2020). No free lunch theorems. Nat. Comput. Ser., 1: 287–322
CrossRef Google scholar
[27]
Cho, K., nboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734
[28]
Pedregosa, F., Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., (2011). Scikit-learn: machine learning in Python. J. Mach. Learn. Res., 12: 2825–2830
[29]
Xu, B., Wang, N., Chen, T. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv, 1505.00853v2
[30]
Gayoso, A., Steier, Z., Lopez, R., Regier, J., Nazor, K. L., Streets, A. (2021). Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods, 18: 272–282
CrossRef Google scholar
[31]
LotfollahiM.,LitinetskayaA.TheisF.. (2022) Multigrate: single-cell multi-omic data integration. bioRxiv, 2022.03.16.484643
[32]
AshuachT.,GabittoM. I.,JordanM. I.. (2021) MultiVI: deep generative model for the integration of multi-modal data. bioRxiv, 2021.08.20.457057
[33]
Luecken, M. D., ttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., . (2022). Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods, 19: 41–50
CrossRef Google scholar
[34]
Mimitou, E. P., Lareau, C. A., Chen, K. Y., Zorzetto-Fernandes, A. L., Hao, Y., Takeshima, Y., Luo, W., Huang, T. S., Yeung, B. Z., Papalexi, E. . (2021). Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol., 39: 1246–1258
CrossRef Google scholar
[35]
Rood, J. E., Maartens, A., Hupalowska, A., Teichmann, S. A. (2022). Impact of the Human Cell Atlas on medicine. Nat. Med., 28: 2486–2496
CrossRef Google scholar

AVAILABILITY AND REQUIREMENTS

Project name: LIBRA.
Project home page: “GitHub website (TranslationalBioinformaticsUnit/LIBRA)”.
Operating system(s): platform independent. Tested on LINUX.
Programming language(s): sc-libra (LIBRA package implementation at PyPI), Python, Jupyter notebook, R and RMarkDown.
License: GPL-3.0 license.
Any restrictions to use by non-academics: none.

AVAILABILITY OF DATA AND MATERIALS

The datasets re-analyzed during the current study are available in the NCBI GEO repository via accession numbers GSE126074, GSE128639, GSE130399, GSE140203, GSE194122, GSE109262 and 10X Genomics website repository. The developed package and its online documentation and the code used for the re-analysis, are available at: sc-libra package: Pypi (sc-libra); sc-libra online docs: Read the docs (sc-libra); GitHub repository: GitHub website (TranslationalBioinformaticsUnit/LIBRA); Cone of GitHub repository plus data repository: Figshare (LIBRA).

ABBREVIATIONS

NNNeural networks
GEOGene Expression Omnibus
SLSShared latent space
PJIPairwise Jaccard Index
DSData set
predRNAPredicted RNA
predATACPredicted ATAC
MSEMean squared error
SNARE-seqDroplet based technology to profile chromatin accessibility and gene expression from the same cells
CITE-seqQualitative information over gene expression and surface proteins with available antibodies on a single cell level
Paired-seqCombinatorial indexing strategy to simultaneously tag both the open chromatin fragments generated by the Tn5 transposases and the cDNA molecules generated from reverse transcription
SHARE-seqStrategy that uses three rounds of barcodes by ligating barcoded adaptors to both RNA (gene expression) and tagmented DNA (chromatin accessibility) to achieve the multi-omic profiling from the same single cells
10X10X Genomics single-cell multiomics solutions
CITE-seqMethod for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level
scNMT-seqMethod to look at methylation (CpG) and chromatin accessibility (GpC)

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/10.15302/J-QB-022-0318.

AUTHORS CONTRIBUTIONS

XMM, DGC designed LIBRA and the computational experiments shown. XMM performed most of the experiments and analyzed the results. SQ conducted the initial experiments associated with BABEL. XMM, JT and DGC wrote the first draft and the final version. SQ, SK, RL, AM, NK, FP, and JT provided additional insights into the experiments and the text. All authors reviewed the manuscript before submission.

ACKNOWLEDGEMENTS

This work was supported by grants from the European Union under the Horizon 2020 programme (MultipleMS grant agreement 733161) to NK; and from the Spanish Government, through project PID2019-111192GA-I00 (MICINN) to DGC.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Xabier Martinez-de-Morentin, Sumeer A. Khan, Robert Lehmann, Sisi Qu, Alberto Maillo, Narsis A. Kiani, Felipe Prosper, Jesper Tegner and David Gomez-Cabrero declare that they have no conflict of interest or financial conflicts to disclose.
This article does not contain any studies with human or animal materials performed by any of the authors.

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2023 The Author(s). Published by Higher Education Press.
AI Summary AI Mindmap
PDF(7940 KB)

Accesses

Citations

Detail

Sections
Recommended

/