LIBRA: an adaptative integrative tool for paired single-cell multi-omics data

Xabier Martinez-de-Morentin; Sumeer A. Khan; Robert Lehmann; Sisi Qu; Alberto Maillo; Narsis A. Kiani; Felipe Prosper; Jesper Tegner; David Gomez-Cabrero

doi:10.15302/J-QB-022-0318

PDF(7940 KB)

Quant. Biol. ›› 2023, Vol. 11 ›› Issue (3) : 246-259. DOI: 10.15302/J-QB-022-0318

RESEARCH ARTICLE

NGS Data Analysis - RESEARCH ARTICLE

LIBRA: an adaptative integrative tool for paired single-cell multi-omics data

Author information +

History +

Abstract

Background: Single-cell multi-omics technologies allow a profound system-level biology understanding of cells and tissues. However, an integrative and possibly systems-based analysis capturing the different modalities is challenging. In response, bioinformatics and machine learning methodologies are being developed for multi-omics single-cell analysis. It is unclear whether current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter fine-tuning.

Methods: We designed LIBRA, a neural network based framework, to learn translation between paired multi-omics profiles so that a shared latent space is constructed. Additionally, we implemented a variation, aLIBRA, that allows automatic fine-tuning by identifying parameter combinations that optimize both the integrative and predictive tasks. All model parameters and evaluation metrics are made available to users with minimal user iteration. Furthermore, aLIBRA allows experienced users to implement custom configurations. The LIBRA toolbox is freely available as R and Python libraries at GitHub (TranslationalBioinformaticsUnit/LIBRA).

Results: LIBRA was evaluated in eight multi-omic single-cell data-sets, including three combinations of omics. We observed that LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type (clustering) resolution in the integrated latent space. Furthermore, when assessing the predictive power across data modalities, such as predictive chromatin accessibility from gene expression, LIBRA outperforms existing tools. As expected, adaptive parameter optimization (aLIBRA) significantly boosted the performance of learning predictive models from paired data-sets.

Conclusion: LIBRA is a versatile tool that performs competitively in both “integration” and “prediction” tasks based on single-cell multi-omics data. LIBRA is a data-driven robust platform that includes an adaptive learning scheme.

Author summary

There is a need for tools that integrate single-cell multi-omic data while addressing several integrative challenges simultaneously. To this end, we designed a deep-learning based tool LIBRA that performs competitively in both “integration” and “prediction” tasks based on single-cell multi-omics data. Furthermore, when assessing the predictive power across data modalities, LIBRA outperforms existing tools. LIBRA and its adaptive scheme aLIBRA, allow automatic fine-tuning for users with limited effort. Additionally, aLIBRA allows experienced users to implement custom configurations. The LIBRA toolbox is freely available as R and Python libraries.

Graphical abstract

Keywords

single-cell / multi-omic / Autoencoder / auto-finetuning

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xabier Martinez-de-Morentin, Sumeer A. Khan, Robert Lehmann, Sisi Qu, Alberto Maillo, Narsis A. Kiani, Felipe Prosper, Jesper Tegner, David Gomez-Cabrero. LIBRA: an adaptative integrative tool for paired single-cell multi-omics data. Quant. Biol., 2023, 11(3): 246‒259 https://doi.org/10.15302/J-QB-022-0318

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Chen, S., Lake, B. B. (2019). High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol., 37: 1452–1457 CrossRef Google scholar

[2]	Cao, J., Cusanovich, D. Ramani, V., Aghamirzaie, D., Pliner, H. Hill, A. Daza, R. McFaline-Figueroa, J. Packer, J. Christiansen, L. . (2018). Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science, 361: 1380–1385 CrossRef Google scholar

[3]	Ma, S., Zhang, B., LaFave, L. M., Earl, A. S., Chiang, Z., Hu, Y., Ding, J., Brack, A., Kartha, V. K., Tay, T. . (2020). Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell, 183: 1103–1116.e20 CrossRef Google scholar

[4]	Zhu, C., Yu, M., Huang, H., Juric, I., Abnousi, A., Hu, R., Lucero, J., Behrens, M. M., Hu, M. (2019). An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol., 26: 1063–1070 CrossRef Google scholar

[5]	Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W. M. Hao, Y., Stoeckius, M., Smibert, P. (2019). Comprehensive integration of single-cell data. Cell, 177: 1888–1902.e21 CrossRef Google scholar

[6]	Stoeckius, M., Hafemeister, C., Stephenson, W., Houck-Loomis, B., Chattopadhyay, P. K., Swerdlow, H., Satija, R. (2017). Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods, 14: 865–868 CrossRef Google scholar

[7]

Clark, S. J., Argelaguet, R., Kapourani, C. A., Stubbs, T. M., Lee, H. J., Alda-Catalinas, C., Krueger, F., Sanguinetti, G., Kelsey, G., Marioni, J. C. . (2018). scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun., 9: 781

CrossRef Google scholar

[8]	Argelaguet, R., Cuomo, A. S. E., Stegle, O. Marioni, J. (2021). Computational principles and challenges in single-cell data integration. Nat. Biotechnol., 39: 1202–1215 CrossRef Google scholar

[9]	Rohart, F., Gautier, B., Singh, A. Cao, K. (2017). mixOmics: An R package for ‘omics feature selection and multiple data integration. PLOS Comput. Biol., 13: e1005752 CrossRef Google scholar

[10]	Argelaguet, R., Velten, B., Arnol, D., Dietrich, S., Zenz, T., Marioni, J. C., Buettner, F., Huber, W. (2018). Multi-omics factor analysis−a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol., 14: e8124 CrossRef Google scholar

[11]	Lock, E. F., Hoadley, K. A., Marron, J. S. Nobel, A. (2013). Joint and individual variation explained (Jive) for integrated analysis of multiple data types. Ann. Appl. Stat., 7: 523–542 CrossRef Google scholar

[12]	Teschendorff, A. E., Jing, H., Paul, D. S., Virta, J. (2018). Tensorial blind source separation for improved analysis of multi-omic data. Genome Biol., 19: 76 CrossRef Google scholar

[13]	Gomez-Cabrero, D., Tarazona, S., s-Vidal, I., Ramirez, R. N., Company, C., Schmidt, A., Reijmers, T., Paul, V. V. S., Marabita, F., guez-Ubreva, J. . (2019). STATegra, a comprehensive multi-omics dataset of B-cell differentiation in mouse. Sci. Data, 6: 256 CrossRef Google scholar

[14]	Stegle, O., Teichmann, S. A. Marioni, J. (2015). Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet., 16: 133–145 CrossRef Google scholar

[15]	Perkel, J. (2021). Single-cell analysis enters the multiomics age. Nature, 595: 614–616 CrossRef Google scholar

[16]	Marx, V. (2022). How single-cell multi-omics builds relationships. Nat. Methods, 19: 142–146 CrossRef Google scholar

[17]	Argelaguet, R., Arnol, D., Bredikhin, D., Deloro, Y., Velten, B., Marioni, J. C. (2020). MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol., 21: 111 CrossRef Google scholar

[18]	Hao, Y., Hao, S., Andersen-Nissen, E., Mauck, W. M. Zheng, S., Butler, A., Lee, M. J., Wilk, A. J., Darby, C., Zager, M. . (2021). Integrated analysis of multimodal single-cell data. Cell, 184: 3573–3587.e29 CrossRef Google scholar

[19]	Wu, K. E., Yost, K. E., Chang, H. Y. (2021). BABEL enables cross-modality translation between multiomic profiles at single-cell resolution. Proc. Natl. Acad. Sci. USA, 118: e2023070118 CrossRef Google scholar

[20]	Fortelny, N. (2020). Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol., 21: 190 CrossRef Google scholar

[21]	RavindraN.,SehanobishA.,PappalardoJ. L.,HaflerD. A.. “Disease state prediction from single-cell data using graph attention networks,” ACM CHIL 2020 - Proc. 2020 ACM Conf. Heal. Inference, Learn., pp. 121–130, 2020

[22]	Kimmel, J. C. Kelley, D. (2021). Semisupervised adversarial neural networks for single-cell classification. Genome Res., 31: 1781–1793 CrossRef Google scholar

[23]	Sargent, B., Jafari, M., Marquez, G., Mehta, A. S., Sun, Y. H., Yang, H. Y., Zhu, K., Isseroff, R. R., Zhao, M. (2022). A machine learning based model accurately predicts cellular response to electric fields in multiple cell types. Sci. Rep., 12: 9912 CrossRef Google scholar

[24]	Yang, F., Wang, W., Wang, F., Fang, Y., Tang, D., Huang, J., Lu, H. (2022). scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell., 4: 852–866 CrossRef Google scholar

[25]	cken, M. D., Burkhardt, D., Cannoodt, R., Lance, C., Agrawal, A., Aliee, H., Chen, A., Deconinck, L., Detweiler, A., . (2021). A sandbox for prediction and integration of DNA, RNA, and protein data in single cells. In: NeurIPS 2021 Track Datasets Benchmarks, pp. 1–13

[26]	Lockett, A. (2020). No free lunch theorems. Nat. Comput. Ser., 1: 287–322 CrossRef Google scholar

[27]	Cho, K., nboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734

[28]	Pedregosa, F., Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., (2011). Scikit-learn: machine learning in Python. J. Mach. Learn. Res., 12: 2825–2830

[29]	Xu, B., Wang, N., Chen, T. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv, 1505.00853v2

[30]	Gayoso, A., Steier, Z., Lopez, R., Regier, J., Nazor, K. L., Streets, A. (2021). Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods, 18: 272–282 CrossRef Google scholar

[31]	LotfollahiM.,LitinetskayaA.TheisF.. (2022) Multigrate: single-cell multi-omic data integration. bioRxiv, 2022.03.16.484643

[32]	AshuachT.,GabittoM. I.,JordanM. I.. (2021) MultiVI: deep generative model for the integration of multi-modal data. bioRxiv, 2021.08.20.457057

[33]	Luecken, M. D., ttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., . (2022). Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods, 19: 41–50 CrossRef Google scholar

[34]

Mimitou, E. P., Lareau, C. A., Chen, K. Y., Zorzetto-Fernandes, A. L., Hao, Y., Takeshima, Y., Luo, W., Huang, T. S., Yeung, B. Z., Papalexi, E. . (2021). Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol., 39: 1246–1258

CrossRef Google scholar

[35]	Rood, J. E., Maartens, A., Hupalowska, A., Teichmann, S. A. (2022). Impact of the Human Cell Atlas on medicine. Nat. Med., 28: 2486–2496 CrossRef Google scholar

AVAILABILITY AND REQUIREMENTS

Project name: LIBRA.

Project home page: “GitHub website (TranslationalBioinformaticsUnit/LIBRA)”.

Operating system(s): platform independent. Tested on LINUX.

Programming language(s): sc-libra (LIBRA package implementation at PyPI), Python, Jupyter notebook, R and RMarkDown.

License: GPL-3.0 license.

Any restrictions to use by non-academics: none.

AVAILABILITY OF DATA AND MATERIALS

The datasets re-analyzed during the current study are available in the NCBI GEO repository via accession numbers GSE126074, GSE128639, GSE130399, GSE140203, GSE194122, GSE109262 and 10X Genomics website repository. The developed package and its online documentation and the code used for the re-analysis, are available at: sc-libra package: Pypi (sc-libra); sc-libra online docs: Read the docs (sc-libra); GitHub repository: GitHub website (TranslationalBioinformaticsUnit/LIBRA); Cone of GitHub repository plus data repository: Figshare (LIBRA).

ABBREVIATIONS

NN	Neural networks
GEO	Gene Expression Omnibus
SLS	Shared latent space
PJI	Pairwise Jaccard Index
DS	Data set
predRNA	Predicted RNA
predATAC	Predicted ATAC
MSE	Mean squared error
SNARE-seq	Droplet based technology to profile chromatin accessibility and gene expression from the same cells
CITE-seq	Qualitative information over gene expression and surface proteins with available antibodies on a single cell level
Paired-seq	Combinatorial indexing strategy to simultaneously tag both the open chromatin fragments generated by the Tn5 transposases and the cDNA molecules generated from reverse transcription
SHARE-seq	Strategy that uses three rounds of barcodes by ligating barcoded adaptors to both RNA (gene expression) and tagmented DNA (chromatin accessibility) to achieve the multi-omic profiling from the same single cells
10X	10X Genomics single-cell multiomics solutions
CITE-seq	Method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level
scNMT-seq	Method to look at methylation (CpG) and chromatin accessibility (GpC)

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/10.15302/J-QB-022-0318.

AUTHORS CONTRIBUTIONS

XMM, DGC designed LIBRA and the computational experiments shown. XMM performed most of the experiments and analyzed the results. SQ conducted the initial experiments associated with BABEL. XMM, JT and DGC wrote the first draft and the final version. SQ, SK, RL, AM, NK, FP, and JT provided additional insights into the experiments and the text. All authors reviewed the manuscript before submission.

ACKNOWLEDGEMENTS

This work was supported by grants from the European Union under the Horizon 2020 programme (MultipleMS grant agreement 733161) to NK; and from the Spanish Government, through project PID2019-111192GA-I00 (MICINN) to DGC.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Xabier Martinez-de-Morentin, Sumeer A. Khan, Robert Lehmann, Sisi Qu, Alberto Maillo, Narsis A. Kiani, Felipe Prosper, Jesper Tegner and David Gomez-Cabrero declare that they have no conflict of interest or financial conflicts to disclose.

This article does not contain any studies with human or animal materials performed by any of the authors.

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.