Prediction of Synthetic Lethality in Escherichia coli Based on Feature Engineering through Graph Embedding

Qian Xu , Yimiao Feng , Haixia Guo , Yawei Su , Xiaoru Chen , Haoran Sun , Jing Feng , Fengbiao Guo

eMicrobe ›› 2026, Vol. 2 ›› Issue (1) : 6

PDF
eMicrobe ›› 2026, Vol. 2 ›› Issue (1) :6 DOI: 10.53941/emicrobe.2026.100006
Original article
research-article
Prediction of Synthetic Lethality in Escherichia coli Based on Feature Engineering through Graph Embedding
Author information +
History +
PDF

Abstract

Synthetic lethality (SL) is a genetic interaction that refers to the phenomenon of cell death caused by the simultaneous inactivation of two non-lethal genes. Due to high-cost constraints and time consumption of experimental screening, computational prediction methods have become the main research tool. Currently, methods based on machine learning have been widely used in SL research, and discovering effective features to enhance the accuracy of predictions remains the key challenge to overcome in current research. We propose an SL prediction method based on graph embedding. First, we transformed five types of raw omics data into graph structures to capture the complex associations among genes. Then, using the graph embedding technique, we extracted feature information for each gene and constructed the feature representation of SL pairs by mathematical operations. Finally, different from GNN, which infers a single graph, we used the machine learning classifiers to discriminate positive and negative samples. Our method achieved better AUC than GNN-based baseline methods. Overall, this study firstly proposed a prediction model for Escherichia coli (E. coli) SLs that integrates the advantages of graph embedding techniques and classifier ensembles, which significantly improves the accuracy and reliability of prediction, and also provides new perspectives and methods for this field.

Keywords

sythetic lethality / Escherichia coli / machine learning / graph embedding

Cite this article

Download citation ▾
Qian Xu, Yimiao Feng, Haixia Guo, Yawei Su, Xiaoru Chen, Haoran Sun, Jing Feng, Fengbiao Guo. Prediction of Synthetic Lethality in Escherichia coli Based on Feature Engineering through Graph Embedding. eMicrobe, 2026, 2(1): 6 DOI:10.53941/emicrobe.2026.100006

登录浏览全文

4963

注册一个新账户 忘记密码

Author Contributions

Q.X. and Y.F.: Conceptualization, methodology; Y.S.: validation; H.G.: formal analysis; Q.X.: data curation, writing—original draft preparation; Y.F.: writing—review and editing; X.C.: visualization; H.S. and J.F.: supervision; F.G.: project administration; F.G.: funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (grant no. 32370696).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the datasets and source codes implementing our method are uploaded to https://github.com/Christal6/ECSLPredict/ (accessed on 3 March 2025).

Acknowledgments

We thank Hongtu Cui and Xiang Lian at the Guo lab for their useful discussions and valuable suggestions.

Conflicts of Interest

The author declares no conflict of interest.

Use of AI and AI-Assisted Technologies

No AI tools were utilized for this paper.

References

[1]

Bridges C.B. Current Maps of the Location of the Mutant Genes of Drosophila Melanogaster. Proc. Natl. Acad. Sci. USA 1921, 7, 127-132.

[2]

Güell O.; Sagués F.; Serrano M.A. Essential Plasticity and Redundancy of Metabolism Unveiled by Synthetic Lethality Analysis. PLoS Comput. Biol. 2014, 10, e1003637.

[3]

Sambamoorthy G.; Raman K. Understanding the Evolution of Functional Redundancy in Metabolic Networks. Bioinformatics 2018, 34, i981-i987.

[4]

Pallotta M.M.; Di Nardo M.; Musio A. Synthetic Lethality between Cohesin and WNT Signaling Pathways in Diverse Cancer Contexts. Cells 2024, 13, 608.

[5]

Hartwell L.H.; Szankasi P.; Roberts C.J.; et al. Integrating Genetic Approaches into the Discovery of Anticancer Drugs. Science 1997, 278, 1064-1068.

[6]

Sigurdsson G.; Fleming R.M.; Heinken A.; et al. A Systems Biology Approach to Drug Targets in Pseudomonas aeruginosa Biofilm. PLoS ONE 2012, 7, e34337.

[7]

Lord C.J.; Ashworth A. PARP Inhibitors: Synthetic Lethality in the Clinic. Science 2017, 355, 1152-1158.

[8]

Guo J.; Liu H.; Zheng J. SynLethDB: Synthetic Lethality Database Toward Discovery of Selective and Sensitive Anticancer Drug Targets. Nucleic Acids Res. 2016, 44, D1011-D1017.

[9]

Wang J.; Wu M.; Huang X.; et al. SynLethDB 2.0: A Web-Based Knowledge Graph Database on Synthetic Lethality for Novel Anticancer Drug Discovery. Database 2022, 2022, baac030.

[10]

Zhu S.-B.; Jiang Q.-H.; Chen Z.-G.; et al. Mslar: Microbial Synthetic Lethal and Rescue Database. PLoS Comput. Biol. 2023, 19, e1011218.

[11]

Rahiminejad S.; De Sanctis B.; Pevzner P.; et al. Synthetic Lethality and the Minimal Genome Size Problem. mSphere 2024, 9, e00139-24.

[12]

Lee S.J.; Lee S.-J.; Lee D.-W. Design and Development of Synthetic Microbial Platform Cells for Bioenergy. Front. Microbiol. 2013, 4, 92.

[13]

Yeh C.-S.; Wang Z.; Miao F.; et al. A Novel Synthetic-Genetic-Array-Based Yeast One-Hybrid System for High Discovery Rate and Short Processing Time. Genome Res. 2019, 29, 1343-1351.

[14]

Stojic L.; Lun A.T.; Mascalchi P.; et al. A High-Content RNAi Screen Reveals Multiple Roles for Long Noncoding RNAs in Cell Division. Nat. Commun. 2020, 11, 1851.

[15]

Wang J.; Zhang Q.; Han J.; et al. Computational Methods, Databases and Tools for Synthetic Lethality Prediction. Brief. Bioinform. 2022, 23, bbac106.

[16]

Li J.; Lu L.; Zhang Y.H.; et al. Identification of Synthetic Lethality Based on a Functional Network by Using Machine Learning Algorithms. J. Cell. Biochem. 2019, 120, 405-416.

[17]

Kranthi T.; Rao S.; Manimaran P. Identification of Synthetic Lethal Pairs in Biological Systems through Network Information Centrality. Mol. Biosyst. 2013, 9, 2163-2167.

[18]

Liany H.; Jeyasekharan A.; Rajan V. Predicting Synthetic Lethal Interactions Using Heterogeneous Data Sources. Bioinformatics 2020, 36, 2209-2216.

[19]

Liu Y.; Wu M.; Liu C.; et al. SL2MF: Predicting Synthetic Lethality in Human Cancers via Logistic Matrix Factorization. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 17, 748-757.

[20]

Wang S.; Xu F.; Li Y.; et al. KG4SL: Knowledge Graph Neural Network for Synthetic Lethality Prediction in Human Cancers. Bioinformatics 2021, 37, i418-i425.

[21]

Long Y.; Wu M.; Liu Y.; et al. Graph Contextualized Attention Network for Predicting Synthetic Lethality in Human Cancers. Bioinformatics 2021, 37, 2432-2440.

[22]

Zhang K.; Wu M.; Liu Y.; et al. KR4SL: Knowledge Graph Reasoning for Explainable Prediction of Synthetic Lethality. Bioinformatics 2023, 39, i158-i167.

[23]

Huang J.; Wu M.; Lu F.; et al. Predicting Synthetic Lethal Interactions in Human Cancers Using Graph Regularized Self-Representative Matrix Factorization. BMC Bioinform. 2019, 20, 657.

[24]

Zhang G.; Chen Y.; Yan C.; et al. MPASL: Multi-Perspective Learning Knowledge Graph Attention Network for Synthetic Lethality Prediction in Human Cancer. Front. Pharmacol. 2024, 15, 1398231.

[25]

Hoang V.T.; Jeon H.-J.; You E.-S.; et al. Graph Representation Learning and Its Applications: A Survey. Sensors 2023, 23, 4168.

[26]

Perozzi B.; Al-Rfou R.; Skiena S. Deepwalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24-27 August 2014; pp 701-710.

[27]

Grover A.; Leskovec J. node2vec:Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13-17 August 2016; pp 855-864.

[28]

Forster D.T.; Li S.C.; Yashiroda Y.; et al. BIONIC: Biological Network Integration Using Convolutions. Nat. Methods 2022, 19, 1250-1261.

[29]

Cho H.; Berger B.; Peng J. Compact Integration of Multi-Network Topology for Functional Analysis of Genes. Cell Syst. 2016, 3, 540-548.e5.

[30]

Côté J.-P.; French S.; Gehrke S.S.; et al. The Genome-Wide Interaction Network of Nutrient Stress Genes in Escherichia coli. mBio 2016, 7, e01714-16.

[31]

French S.; Côté J.-P.; Stokes J.M.; et al. Bacteria Getting into Shape: Genetic Determinants of E. coli Morphology. mBio 2017, 8, e01977-16.

[32]

Minchin S.; Lodge J. Understanding Biochemistry: Structure and Function of Nucleic Acids. Essays Biochem. 2019, 63, 433-456.

[33]

Duan Z.-H.; Hughes B.; Reichel L.; et al. The Relationship between Protein Sequences and Their Gene Ontology Functions. BMC Bioinform. 2006, 7, 89.

[34]

De Las Rivas, J.; Fontanillo C. Protein-Protein Interaction Networks: Unraveling the Wiring of Molecular Machines within the Cell. Brief. Funct. Genom. 2012, 11, 489-496.

[35]

Szklarczyk D.; Kirsch R.; Koutrouli M.; et al. The STRING Database in 2023: Protein-Protein Association Networks and Functional Enrichment Analyses for Any Sequenced Genome of Interest. Nucleic Acids Res. 2023, 51, D638-D646.

[36]

Liu G.; Yong M.Y.J.; Yurieva M.; et al. Gene Essentiality is a Quantitative Property Linked to Cellular Evolvability. Cell 2015, 163, 1388-1399.

[37]

Wei W.; Ye Y.-N.; Luo S.; et al. IFIM: A Database of Integrated Fitness Information for Microbial Genes. Database 2014, 2014, bau052.

[38]

Wen Q.-F.; Wei W.; Guo F.-B. Geptop 2.0:Accurately Select Essential Genes from the List of Protein-Coding Genes in Prokaryotic Genomes. In Essential Genes and Genomes:Methods and Protocols; Springer: Berlin/Heidelberg, Germany, 2022; pp 423-430.

[39]

Hazra A.; Gogtay N. Biostatistics Series Module 6: Correlation and Linear Regression. Indian J. Dermatol. 2016, 61, 593-601.

[40]

Hassanat A.B. Two-Point-Based Binary Search Trees for Accelerating Big Data Classification Using KNN. PLoS ONE 2018, 13, e0207772.

[41]

Huang M.-W.; Tsai C.-F.; Tsui S.-C.; et al. Combining Data Discretization and Missing Value Imputation for Incomplete Medical Datasets. PLoS ONE 2023, 18, e0295032.

[42]

Chen C.-Y.; Chang Y.-W. Missing Data Imputation Using Classification and Regression Trees. PeerJ Comput. Sci. 2024, 10, e2119.

[43]

Qiu Y.L.; Zheng H.; Gevaert O. Genomic Data Imputation with Variational Auto-Encoders. Gigascience 2020, 9, giaa082.

[44]

Kingma D.P.; Welling M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114.

[45]

Cai R.; Chen X.; Fang Y.; et al. Dual-Dropout Graph Convolutional Network for Predicting Synthetic Lethality in Human Cancers. Bioinformatics 2020, 36, 4458-4465.

[46]

Hao Z.; Wu D.; Fang Y.; et al. Prediction of Synthetic Lethal Interactions in Human Cancers Using Multi-View Graph Auto-Encoder. IEEE J. Biomed. Health Inform. 2021, 25, 4041-4051.

[47]

Dehghan Manshadi M.; Setoodeh P.; Zare H. Rapid-SL Identifies Synthetic Lethal Sets with an Arbitrary Cardinality. Sci. Rep. 2022, 12, 14022.

[48]

Singh A.; Ogunfunmi T. An Overview of Variational Autoencoders for Source Separation, Finance, and Bio-Signal Applications. Entropy 2021, 24, 55.

[49]

Jaksik R.; Iwanaszko M.; Rzeszowska-Wolny J.; et al. Microarray Experiments and Factors Which Affect Their Reliability. Biol. Direct 2015, 10, 46.

[50]

Robinson M.D.; Cai P.; Emons M.; et al. Ten Simple Rules for Computational Biologists Collaborating with Wet Lab Researchers. PLoS Comput. Biol. 2024, 20, e1012174.

[51]

Li H.; Sun X.; Cui W.; et al. Computational Drug Development for Membrane Protein Targets. Nat. Biotechnol. 2024, 42, 229-242.

PDF

0

Accesses

0

Citation

Detail

Sections
Recommended

/