Effectively Preserving Biological Variations in Multi-Batch and Multi-Condition Single-Cell Data Integration

Qingbin Zhou , Tao Ren , Fan Yuan , Jiating Yu , Jiacheng Leng , Jiahao Song , Duanchen Sun , Ling-Yun Wu

CSIAM Trans. Life Sci. ›› 2026, Vol. 2 ›› Issue (1) : 177 -202.

PDF
CSIAM Trans. Life Sci. ›› 2026, Vol. 2 ›› Issue (1) :177 -202. DOI: 10.4208/csiam-ls.SO-2025-0025
Research Articles
research-article
Effectively Preserving Biological Variations in Multi-Batch and Multi-Condition Single-Cell Data Integration
Author information +
History +
PDF

Abstract

Understanding phenotypic differences at the cell level is critical for com- prehending the underlying pathogenesis of related complex diseases. However, the biological variations are obscured by batch effects, posing a challenge for integrat- ing multi-batch and multi-condition single-cell datasets. Here, we present scFLASH, a deep learning-based model specially designed to explore single-cell biological variations while correcting undesired batch effects. scFLASH employs a conditional variational autoencoder with adversarial training to separate biological variations from technical noise and introduces a penalized condition classifier to preserve condi- tion-specific biological signals. Through comprehensive benchmarking evaluations, scFLASH shows superior integration performances compared to other state-of-the-art methods. Applied to datasets such as Alzheimer’s disease, COVID-19, and diabetes, we demonstrate that scFLASH is applicable to various scenarios, effectively integrat- ing datasets with two or more conditions and different batch sources. scFLASH can enhance the gene expression profiles and identify the condition-related cell subpopu- lations, facilitating downstream analyses and offering biological insights into the cel- lular mechanisms of disease pathology.

Keywords

Biological variations / data integration / batch correction / deep learning

Cite this article

Download citation ▾
Qingbin Zhou, Tao Ren, Fan Yuan, Jiating Yu, Jiacheng Leng, Jiahao Song, Duanchen Sun, Ling-Yun Wu. Effectively Preserving Biological Variations in Multi-Batch and Multi-Condition Single-Cell Data Integration. CSIAM Trans. Life Sci., 2026, 2(1): 177-202 DOI:10.4208/csiam-ls.SO-2025-0025

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

J. A. Agundez, F. J. Jimenez-Jimenez, H. Alonso-Navarro, and E. Garcia-Martin, The potential of LINGO-1 as a therapeutic target for essential tremor, Expert Opin. Ther. Targets, 19:1139-1148, 2015.

[2]

S. Aibar et al. SCENIC: Single-cell regulatory network inference and clustering, Nat. Methods, 14:1083-1086, 2017.

[3]

J. L. Andrews and F. Fernandez-Enright, A decade from discovery to therapy: Lingo-1, the dark horse in neurological and psychiatric disorders, Neurosci. Biobehav. Rev., 56:97-114, 2015.

[4]

P. S. Arunachalam et al., Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans, Science, 369:1210-1220, 2020.

[5]

D. B. Burkhardt et al., Quantifying the effect of experimental perturbations at single-cell resolution, Nat. Biotechnol., 39:619-629, 2021.

[6]

J. Cha and I. Lee, Single-cell network biology for resolving cellular heterogeneity in human diseases, Exp. Mol. Med., 52:1798- 1808, 2020.

[7]

E. Dann, N. C. Henderson, S. A. Teichmann, M. D. Morgan, and J. C. Marioni, Differential abundance testing on single-cell data using k-nearest neighbor graphs, Nat. Biotechnol., 40:245-253, 2022.

[8]

C. De Donno et al., Population-level integration of single-cell datasets enables multi-scale analysis across samples, Nat. Methods, 20:1683-1692, 2023.

[9]

Q. Duan et al., LINCS Canvas Browser: Interactive web app to query, browse and interrogate LINCS L 1000 gene expression signatures, Nucleic Acids Res., 42:W449-460, 2014.

[10]

R. M. Elgamal et al., An integrated map of cell type-specific gene expression in pancreatic islets, Diabetes, 72:1719-1728, 2023.

[11]

Y. Ganin et al. Domain-adversarial training of neural networks, J. , Mach. Learn. Res., 17:2096-2030, 2016.

[12]

A. Goeva et al., HiDDEN: A machine learning method for detection of disease-relevant populations in case-control single-cell transcriptomics data, Nat. Commun., 15:9468, 2024.

[13]

A. Grubman et al., A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation, Nat. Neurosci., 22:2087-2097, 2019.

[14]

T. Hamano, K. Hayashi, N. Shirafuji, and Y. Nakamoto, The implications of autophagy in Alzheimer’s disease, Curr. Alzheimer Res., 15:1283-1296, 2018.

[15]

B. Hie, B. Bryson, and B. Berger, Efficient integration of heterogeneous single-cell transcriptomes using Scanorama, Nat. Biotechnol., 37:685-691, 2019.

[16]

K. H. Kaestner, A. C. Powers, A. Naji,H. Consortium, and M. A. Atkinson, NIH initiative to improve understanding of the pancreas, islet, and autoimmunity in type 1 diabetes: The Human Pancreas Analysis Program (HPAP), Diabetes, 68:1394-1402, 2019.

[17]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980, 2014.

[18]

D. P. Kingma and M. Welling, Auto-encoding variational bayes, arXiv:1312.6114, 2013.

[19]

I. Korsunsky et al., Fast, sensitive and accurate integration of single-cell data with Harmony, Nat. Methods, 16:1289-1296, 2019.

[20]

M. V. Kuleshov et al., Enrichr: A comprehensive gene set enrichment analysis web server 2016 update, Nucleic Acids Res., 44:W90-97, 2016.

[21]

T. Kuret, S. Sodin-Semrl, B. Leskosek, and P. Ferk, Single cell RNA sequencing in autoim-mune inflammatory rheumatic diseases: Current applications, challenges and a step toward precision medicine, Front. Med., 8:822804, 2021.

[22]

J. S. Lee et al., Immunophenotyping of COVID-19 and influenza highlights the role of type I inter-ferons in development of severe COVID-19, Sci. Immunol., 5(49):eabd1554, 2020.

[23]

W. S. Liang et al., Gene expression profiles in anatomically and functionally distinct regions of the normal aged human brain, Physiol. Genomics, 28:311-322, 2007.

[24]

Y. Lin, Y. Cao, E. Willie,E. Patrick, and J. Y. H. Yang, Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2, Nat. Commun., 14:4272, 2023.

[25]

R. Liu, K. Qian, X. He, and H. Li, Integration of scRNA-seq data by disentangled representation learning with condition domain adaptation, BMC Bioinformatics, 25:116, 2024.

[26]

R. Lopez, J. Regier, M. B. Cole, M. I. Jordan, and N. Yosef, Deep generative modeling for single-cell transcriptomics, Nat. Methods, 15:1053-1058, 2018.

[27]

M. D. Luecken et al., Benchmarking atlas-level data integration in single-cell genomics, Nat. Meth-ods, 19:41-50, 2022.

[28]

A. Ma, J. Wang, D. Xu, and Q. Ma, Deep learning analysis of single-cell data in empowering clinical implementation, Clin. Transl. Med., 12:e950, 2022.

[29]

R. Ma, E. D. Sun, D. Donoho, and J. Zou, Principled and interpretable alignability testing and integration of single-cell data, Proc. Natl. Acad. Sci. USA, 121:e2313719121, 2024.

[30]

M. L. Martins et al., A potent inflammatory response is triggered in asymptomatic blood donors with recent SARS-CoV-2 infection, Rev. Soc. Bras. Med. Trop., 55:e02392022, 2022.

[31]

S. Mi et al., LINGO-1 negatively regulates myelination by oligodendrocytes, Nat. Neurosci., 8:745-751, 2005.

[32]

K. R. Moon et al., Visualizing structure and transitions in high-dimensional biological data, Nat. Biotechnol., 37:1482-1492, 2019.

[33]

S. Nejentsev et al., Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A, Nature, 450:887-892, 2007.

[34]

E. Pairo-Castineira et al., Genetic mechanisms of critical illness in COVID-19, Nature, 591:92-98, 2021.

[35]

Z. Piran and M. Nitzan, SiFT: Uncovering hidden biological processes by probabilistic filtering of single-cell data, Nat. Commun., 15:760, 2024.

[36]

K. Polanski et al., BBKNN: Fast batch alignment of single cell transcriptomes, Bioinformatics, 36:964-965, 2020.

[37]

K. Qian, S. Fu, H. Li, and W. V. Li, scINSIGHT for interpreting single-cell gene expression from biologically heterogeneous data, Genome Biol., 23:82, 2022.

[38]

K. Rajasekhar, M. Chakrabarti, and T. Govindaraju, Function and toxicity of amyloid beta and recent therapeutic interventions targeting amyloid beta in Alzheimer’s disease, Chem. Commun., 51:13434-13450, 2015.

[39]

W. M. Rand, Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc., 66: 846-850, 1971.

[40]

P. S. Reel, S. Reel, E. Pearson, E. Trucco, and E. Jefferson, Using machine learning approaches for multi-omics data analysis: A review, Biotechnol. Adv., 49:107739, 2021.

[41]

A. Sette and S. Crotty,Adaptive immunity to SARS-CoV-2 and COVID-19, Cell, 184:861-880, 2021.

[42]

M. Setty et al., Characterization of cell fate probabilities in single-cell data with Palantir, Nat. Biotechnol., 37:451-460, 2019.

[43]

Severe Covid-19 GWAS Group et al., Genomewide association study of severe Covid-19 with respiratory failure, N. Engl. J. Med., 383:1522-1534, 2020.

[44]

A. K. Shalek and M. Benson, Single-cell analyses to tailor treatments, Sci. Transl. Med., 9:eaan4730, 2017.

[45]

S. N. Shapira, A. Naji, M. A. Atkinson,A. C. Powers, and K. H. Kaestner, Understanding islet dysfunction in type 2 diabetes through multidimensional pancreatic phenotyping: The Human Pancreas Analysis Program, Cell Metab., 34:1906-1913, 2022.

[46]

A. Shree, M. K. Pavan, and H. Zafar, scDREAMER for atlas-level integration of single-cell datasets using deep generative model paired with adversarial classifier, Nat. Commun., 14:7781, 2023.

[47]

M. A. Skinnider et al., Cell type prioritization in single-cell data, Nat. Biotechnol., 39:30-34, 2021.

[48]

K. Sohn, X. Yan, and H. Lee, Learning structured output representation using deep conditional generative models, in: Proceedings of the 28th International Conference on Neural Informa-tion Processing Systems, MIT Press, 3483-3491, 2015.

[49]

E. Stephenson et al., Single-cell multi-omics analysis of the immune response in COVID-19, Nat. Med., 27:904-916, 2021.

[50]

T. Stuart et al., Comprehensive integration of single-cell data, Cell, 177:1888-1902, 2019.

[51]

D. Sun et al., Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data, Nat. Biotechnol., 40:527-538, 2022.

[52]

W. Tang et al., Single-cell RNA-sequencing in asthma research, Front. Immunol., 13:988573, 2022.

[53]

X. Wang et al., Identification of distinct immune cell subsets associated with asymptomatic infection, disease severity, and viral persistence in COVID-19 patients, Front. Immunol., 13:812514, 2022.

[54]

F. A. Wolf, P. Angerer, and F. J. Theis, SCANPY: Large-scale single-cell gene expression data analysis, Genome Biol., 19:15, 2018.

[55]

J. Zierer, C. Menni, G. Kastenmuller, and T. D. Spector, Integration of ‘omics’ data in aging research: From biomarkers to systems biology, Aging Cell, 14:933-944, 2015.

[56]

Z. Zhang, X. Zhao, M. Bindra, P. Qiu, and X. Zhang, scDisInFact: Disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data, Nat. Commun., 15:912, 2024.

[57]

W. Zhao et al., Deconvolution of cell type-specific drug responses in human tumor tissue with single-cell RNA-seq, Genome Med., 13:82, 2021.

[58]

K. Xu et al., Detecting anomalous anatomic regions in spatial transcriptomics with STANDS, Nat. Commun., 15:8223, 2024.

[59]

G. Yu, L.-G. Wang,Y. Han, and Q.-Y. He, clusterProfiler: An R package for comparing biological themes among gene clusters, OMICS J. Integr. Biol., 16:284-287, 2012.

[60]

H. Zelova and J. Hosek, TNF-alpha signalling and inflammation: Interactions between old ac-quaintances, Inflamm. Res., 62:641-651, 2013.

[61]

J. Zhao et al., Detection of differentially abundant cell subpopulations in scRNA-seq data, Proc. Natl. Acad. Sci. USA, 118:e2100293118, 2021.

PDF

0

Accesses

0

Citation

Detail

Sections
Recommended

/