Discovery, design, and engineering of enzymes based on molecular retrobiosynthesis

Ancheng Chen , Xiangda Peng , Tao Shen , Liangzhen Zheng , Dong Wu , Sheng Wang

mLife ›› 2025, Vol. 4 ›› Issue (2) : 107 -125.

PDF
mLife ›› 2025, Vol. 4 ›› Issue (2) : 107 -125. DOI: 10.1002/mlf2.70009
REVIEW

Discovery, design, and engineering of enzymes based on molecular retrobiosynthesis

Author information +
History +
PDF

Abstract

Biosynthesis—a process utilizing biological systems to synthesize chemical compounds—has emerged as a revolutionary solution to 21st-century challenges due to its environmental sustainability, scalability, and high stereoselectivity and regioselectivity. Recent advancements in artificial intelligence (AI) are accelerating biosynthesis by enabling intelligent design, construction, and optimization of enzymatic reactions and biological systems. We first introduce the molecular retrosynthesis route planning in biochemical pathway design, including single-step retrosynthesis algorithms and AI-based chemical retrosynthesis route design tools. We highlight the advantages and challenges of large language models in addressing the sparsity of chemical data. Furthermore, we review enzyme discovery methods based on sequence and structure alignment techniques. Breakthroughs in AI-based structural prediction methods are expected to significantly improve the accuracy of enzyme discovery. We also summarize methods for de novo enzyme generation for nonnatural or orphan reactions, focusing on AI-based enzyme functional annotation and enzyme discovery techniques based on reaction or small molecule similarity. Turning to enzyme engineering, we discuss strategies to improve enzyme thermostability, solubility, and activity, as well as the applications of AI in these fields. The shift from traditional experiment-driven models to data-driven and computationally driven intelligent models is already underway. Finally, we present potential challenges and provide a perspective on future research directions. We envision expanded applications of biocatalysis in drug development, green chemistry, and complex molecule synthesis.

Keywords

artificial intelligence / enzyme design / enzyme discovery / enzyme engineering / molecular retrosynthesis planning

Cite this article

Download citation ▾
Ancheng Chen, Xiangda Peng, Tao Shen, Liangzhen Zheng, Dong Wu, Sheng Wang. Discovery, design, and engineering of enzymes based on molecular retrobiosynthesis. mLife, 2025, 4(2): 107-125 DOI:10.1002/mlf2.70009

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Groschwitz KR, Hogan SP. Intestinal barrier function: molecular regulation and disease pathogenesis. J Allergy Clin Immunol. 2009; 124: 3-20; quiz 21-2.

[2]

Rahman FA, Aziz MMA, Saidur R, Bakar WAWA, Hainin MR, Putrajaya R, et al. Pollution to solution: capture and sequestration of carbon dioxide (CO2) and its utilization as a renewable energy source for a sustainable future. Renew Sust Energy Rev. 2017; 71: 112-126.

[3]

Bradu P, Biswas A, Nair C, Sreevalsakumar S, Patil M, Kannampuzha S, et al. Recent advances in green technology and industrial revolution 4.0 for a sustainable future. Environ Sci Pollut Res Int. 2023; 30: 124488-124519.

[4]

Hao C, Xu L, Kuang H, Xu C. Artificial chiral probes and bioapplications. Adv Mater. 2020; 32: e1802075.

[5]

Sharma A, Gupta G, Ahmad T, Mansoor S, Kaur B. Enzyme engineering: current trends and future perspectives. Food Rev Int. 2021; 37: 121-154.

[6]

Nicolaou KC, Snyder SA. Chasing molecules that were never there: misassigned natural products and the role of chemical synthesis in modern structure elucidation. Angew Chem Int Ed. 2005; 44: 1012-1044.

[7]

Yu T, Boob AG, Volk MJ, Liu X, Cui H, Zhao H. Machine learning-enabled retrobiosynthesis of molecules. Nat Catal. 2023; 6: 137-151.

[8]

Han Y, Xu X, Hsieh CY, Ding K, Xu H, Xu R, et al. Retrosynthesis prediction with an iterative string editing model. Nat Commun. 2024; 15: 6404.

[9]

Zheng S, Zeng T, Li C, Chen B, Coley CW, Yang Y, et al. Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP. Nat Commun. 2022; 13: 3342.

[10]

UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015; 43: D204-12.

[11]

Yu T, Cui H, Li JC, Luo Y, Jiang G, Zhao H. Enzyme function prediction using contrastive learning. Science. 2023; 379: 1358-1363.

[12]

Hua C, Liu Y, Zhang D, Zhang O, Luan S, Yang KK, et al. Enzymeflow: Generating reaction-specific enzyme catalytic pockets through flow matching and co-evolutionary dynamics. arXiv. 2024. https://doi.org/10.48550/arXiv.2410.00327

[13]

Kanehisa M. The KEGG database. Novartis Found Symp. 2002; 247: 91-252.

[14]

Caspi R, Billington R, Keseler IM, Kothari A, Krummenacker M, Midford PE, et al. The MetaCyc database of metabolic pathways and enzymes—a 2019 update. Nucleic Acids Res. 2020; 48: D445-D453.

[15]

Caspi R, Billington R, Fulcher CA, Keseler IM, Kothari A, Krummenacker M, et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 2018; 46: D633-D639.

[16]

Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023; 620: 1089-1100.

[17]

Krishna R, Wang J, Ahern W, Sturmfels P, Venkatesh P, Kalvet I, et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science. 2024; 384: eadl2528.

[18]

Lisanza SL, Gershon JM, Tipps SWK, Sims JN, Arnoldt L, Hendel SJ, et al. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat Biotechnol. 2024;42: 1-11.

[19]

Vázquez Torres S, Leung PJY, Venkatesh P, Lutz ID, Hink F, Huynh H-H, et al. De novo design of high-affinity binders of bioactive helical peptides. Nature. 2024; 626: 435-442.

[20]

Zhao X, Wu Y, Feng T, Shen J, Lu H, Zhang Y, et al. Dynamic upregulation of the rate-limiting enzyme for valerolactam biosynthesis in Corynebacterium glutamicum. Metab Eng. 2023; 77: 89-99.

[21]

Nabi M, Liang H, Zhou Q, Cao J, Gao D. In-situ membrane fouling control and performance improvement by adding materials in anaerobic membrane bioreactor: a review. Sci Total Environ. 2023; 865: 161262.

[22]

Zhavoronkov A, Mamoshina P, Vanhaelen Q, Scheibye-Knudsen M, Moskalev A, Aliper A. Artificial intelligence for aging and longevity research: recent advances and perspectives. Ageing Res Rev. 2019; 49: 49-66.

[23]

Gayathiri E, Prakash P, Kumaravel P, Jayaprakash J, Ragunathan MG, Sankar S, et al. Computational approaches for modeling and structural design of biological systems: a comprehensive review. Prog Biophys Mol Biol. 2023; 185: 17-32.

[24]

Mican J, Jaradat DMM, Liu W, Weber G, Mazurenko S, Bornscheuer UT, et al. Exploring new galaxies: perspectives on the discovery of novel PET-degrading enzymes. Appl Catal B. 2024; 342: 123404.

[25]

Chugh V, Basu A, Kaushik A, Manshu I, Bhansali S, Basu AK. Employing nano-enabled artificial intelligence (AI)-based smart technologies for prediction, screening, and detection of cancer. Nanoscale. 2024; 16: 5458-5486.

[26]

Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. Brief Bioinform. 2023; 24: bbad289.

[27]

Cordoves-Delgado G, García-Jacas CR. Predicting antimicrobial peptides using ESMFold-predicted structures and ESM-2-based amino acid features with graph deep learning. J Chem Inf Model. 2024; 64: 4310-4321.

[28]

Hayes T, Rao R, Akin H, Sofroniew NJ, Oktay D, Lin Z, et al. Simulating 500 million years of evolution with a language model. Science. 2025; 387: 850-858.

[29]

Bryant P, Pozzati G, Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun. 2022; 13: 1265.

[30]

Kim G, Lee S, Levy Karin E, Kim H, Moriwaki Y, Ovchinnikov S, et al. Easy and accurate protein structure prediction using ColabFold. Nat Protoc. 2025; 20: 620-642.

[31]

Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods. 2024; 21: 1514-1524.

[32]

Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024; 630: 493-500.

[33]

Discovery C, Boitreaud J, Dent J, Mcpartlon M, Meier J, Reis V, et al. Chai-1: Decoding the molecular interactions of life. bioRxiv. 2024. https://www.biorxiv.org/content/10.1101/2024.10.10.615955v2

[34]

Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte RJ, Milles LF, et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science. 2022; 378: 49-56.

[35]

Yenduri G, Ramalingam M, Selvi GC, Supriya Y, Srivastava G, Maddikunta PKR, et al. Gpt (generative pre-trained transformer)—a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access. 2024; 12: 54608-54649.

[36]

Zhang EY, Cheok AD, Pan Z, Cai J, Yan Y. From turing to transformers: A comprehensive review and tutorial on the evolution and applications of generative transformer models. Sci. 2023; 5: 46.

[37]

Cramer P. AlphaFold2 and the future of structural biology. Nat Struct Mol Biol. 2021; 28: 704-705.

[38]

Luo Q, Wang S, Li HY, Zheng L, Mu Y, Guo J. Benchmarking reverse docking through AlphaFold2 human proteome. Prot Sci. 2024; 33: e5167.

[39]

Zhang Z, Shen WX, Liu Q, Zitnik M. Efficient generation of protein pockets with PocketGen. Nat Mach Intell. 2024; 6: 1382-1395.

[40]

Jiang F, Li M, Dong J, Yu Y, Sun X, Wu B, et al. A general temperature-guided language model to design proteins of enhanced stability and activity. Sci Adv. 2024; 10: eadr2641.

[41]

Zhou B, Zheng L, Wu B, Tan Y, Lv O, Yi K, et al. Protein engineering with lightweight graph denoising neural networks. J Chem Inf Model. 2024; 64: 3650-3661.

[42]

de Souza ROMA, Miranda LSM, Bornscheuer UT. A retrosynthesis approach for biocatalysis in organic synthesis. Chemistry. 2017; 23: 12040-12063.

[43]

Zhong Z, Song J, Feng Z, Liu T, Jia L, Yao S, et al. Recent advances in deep learning for retrosynthesis. WIRES Comput Mol Sci. 2024; 14: e1694.

[44]

Chen S, Jung Y. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au. 2021; 1: 1612-1620.

[45]

Yan Y, Zhao Y, Yao H, Feng J, Liang L, Han W, et al. RPBP: deep retrosynthesis reaction prediction based on byproducts. J Chem Inf Model. 2023; 63: 5956-5970.

[46]

Chen B, Li C, Dai H, Song L. Retro*: learning retrosynthetic planning with neural guided A* search. arXiv. 2006. https://arxiv.org/abs/2006.15820

[47]

Grzybowski BA, Szymkuć S, Gajewska EP, Molga K, Dittwald P, Wołos A, et al. Chematica: a story of computer code that started to think like a chemist. Chem. 2018; 4: 390-398.

[48]

Han P, Zhao P, Lu C, Huang J, Wu J, Shang S, et al. Gnn-retro: retrosynthetic planning with graph neural networks. Proc AAAI Conf Artif Intell. 2022; 36: 4014-4021.

[49]

Lin K, Xu Y, Pei J, Lai L. Automatic retrosynthetic route planning using template-free models. Chem Sci. 2020; 11: 3355-3364.

[50]

Genheden S, Thakkar A, Chadimová V, Reymond JL, Engkvist O, Bjerrum E. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J Cheminform. 2020; 12: 70.

[51]

Peter H-J, Ehlers R, Mattmüller R. Synthia: Verification and synthesis for timed automata. In: Computer Aided Verification: 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings 23. Springer; 2011. p. 649-655.

[52]

Zhang D, Liu W, Tan Q, Chen J, Yan H, Yan Y, et al. Chemllm: A chemical large language model. arXiv. 2024. https://arxiv.org/abs/2402.06852

[53]

Hatakeyama-Sato K, Yamane N, Igarashi Y, Nabae Y, Hayakawa T. Prompt engineering of GPT-4 for chemical research: what can/cannot be done? Sci Technol Adv Mater. 2023; 3: 2260300.

[54]

Ye G. De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning. J Comput Aided Mol Des. 2024; 38: 20.

[55]

Li J, Zhang D, Wang X, Hao Z, Lei J, Tan Q, et al. Chemvlm: Exploring the power of multimodal large language models in chemistry area. arXiv. 2024. https://arxiv.org/abs/2408.07246

[56]

Huang Y, Zhang R, He X, Zhi X, Wang H, Li X, et al. ChemEval: a comprehensive multi-level chemical evaluation for large language models. arXiv. 2024. https://arxiv.org/abs/2409.13989

[57]

Bilal M, Zhao Y, Noreen S, Shah SZH, Bharagava RN, Iqbal HMN, et al Modifying bio-catalytic properties of enzymes for efficient biocatalysis: a review from immobilization strategies viewpoint. Biocatal Biotransform. 2019; 37: 159-182.

[58]

Pandey A, Mann M. Proteomics to study genes and genomes. Nature. 2000; 405: 837-846.

[59]

Damborsky J, Brezovsky J. Computational tools for designing and engineering enzymes. Curr Opin Chem Biol. 2014; 19: 8-16.

[60]

Kiss G, Çelebi-Ölçüm N, Moretti R, Baker D, Houk KN. Computational enzyme design. Angew Chem Int Ed. 2013; 52: 5700-5725.

[61]

Koonin E, Galperin MY. Sequence—evolution—function: computational approaches in comparative genomics. Boston: Kluwer Academic; 2002.

[62]

Likic V. The Needleman-Wunsch algorithm for sequence alignment. In: Lecture given at the 7th Melbourne Bioinformatics Course, Bi021 Molecular Science and Biotechnology Institute, University of Melbourne. Melbourne, Australia; 2008. p. 1-46.

[63]

Tian W, Arakaki AK, Skolnick J. EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. Nucleic Acids Res. 2004; 32: 6226-6239.

[64]

Copp JN, Akiva E, Babbitt PC, Tokuriki N. Revealing unexplored sequence-function space using sequence similarity networks. Biochemistry. 2018; 57: 4651-4662.

[65]

Atkinson HJ, Morris JH, Ferrin TE, Babbitt PC. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS One. 2009; 4: e4345.

[66]

Barber AE, Babbitt PC. Pythoscape: a framework for generation of large protein similarity networks. Bioinformatics. 2012; 28: 2845-2846.

[67]

Zallot R, Oberg N, Gerlt JA. The EFI web resource for genomic enzymology tools: leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways. Biochemistry. 2019; 58: 4169-4182.

[68]

Katoh K, Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008; 9: 286-298.

[69]

Hong L, Sun S, Zheng L, Tan Q, Li YJB. fastmsa: Accelerating multiple sequence alignment with dense retrieval on protein language. bioRxiv. 2021. https://www.biorxiv.org/content/10.1101/2021.12.20.473431v1

[70]

Gong J, Jiang L, Chen Y, Zhang Y, Li X, Ma Z, et al. THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model. Bioinformatics. 2023; 39: btad646.

[71]

Zaparucha A, De Berardinis V, Vaxelaire-Vergne C. Genome mining for enzyme discovery. In: Williams G, Hall M, editors. Modern biocatalysis: advances towards synthetic biological systems, Vol. 1. The Royal Society of Chemistry. 2018. p. 1-27.

[72]

Wang L, Dash S, Ng CY, Maranas CD. A review of computational tools for design and reconstruction of metabolic pathways. Synth Syst Biotechnol. 2017; 2: 243-252.

[73]

Saa P.A. Rational metabolic pathway prediction and design: computational tools and their applications for yeast systems and synthetic biology. In: Darvishi Harzevili, F, editors. Synthetic biology of yeasts. Cham: Springer. 2022. p. 3-25.

[74]

Scherlach K, Hertweck C. Mining and unearthing hidden biosynthetic potential. Nat Commun. 2021; 12: 3864.

[75]

Wu L, Liu H, Xu Y, Nie Y. Entering an era of protein structuromics. Biochemistry. 2023; 62: 3167-3169.

[76]

Yu C, Huang L. Cross-linking mass spectrometry: an emerging technology for interactomics and structural biology. Anal Chem. 2018; 90: 144-165.

[77]

Edgar RC, Tolstoy IJB. Muscle-3D: scalable multiple protein structure alignment. bioRxiv. 2024. https://www.biorxiv.org/content/10.1101/2024.10.26.620413v1

[78]

Steinegger M, Levy E, Kim W, Mirdita M, Karin E, Gilchrist C, et al. Rapid and sensitive protein complex alignment with foldseek-multimer. Nat Methods. 2025; 22: 469-472.

[79]

Tyzack JD, Furnham N, Sillitoe I, Orengo CM, Thornton JM. Exploring enzyme evolution from changes in sequence, structure, and function. Methods Mol Biol. 2019; 1851: 263-275.

[80]

Xu K, Feng H, Zhang H, He C, Kang H, Yuan T, et al. Structure-guided discovery of highly efficient cytidine deaminases with sequence-context independence. Nat Biomed Eng. 2025; 9: 93-108.

[81]

Huang J, Lin Q, Fei H, He Z, Xu H, Li Y, et al. Discovery of deaminase functions by structure-based protein clustering. Cell. 2023; 186: 3182-3195.e14.

[82]

Chen J, Ni D, Zhu Y, Xu W, Moussa TaA, Zhang W, et al. Discovery of a thermostable tagatose 4-epimerase powered by structure- and sequence-based protein clustering. J Agricult Food Chem. 2024; 72: 18585-18593.

[83]

Deng J, Li X, Yu H, Yang L, Wang Z, Yi W, et al. Accelerated discovery and miniaturization of novel single-stranded cytidine deaminases. Nucleic Acids Res. 2024; 52: 11188-11202.

[84]

Feng M, Liu Y, He B, Zhong H, Qu-Bie A, Li M, et al. An efficient flavonoid glycosyltransferase NjUGT73B1 from Nardostachys jatamansi of alpine Himalayas discovered by structure-based protein clustering. Phytochemistry. 2024; 227: 114228.

[85]

Ebert MC, Pelletier JN. Computational tools for enzyme improvement: why everyone can-and should-use them. Curr Opin Chem Biol. 2017; 37: 89-96.

[86]

Kandlinger F, Plach MG, Merkl R. AGeNNT: annotation of enzyme families by means of refined neighborhood networks. BMC Bioinformatics. 2017; 18: 274.

[87]

Shi Z, Deng R, Yuan Q, Mao Z, Wang R, Li H, et al. Enzyme commission number prediction and benchmarking with hierarchical dual-core multitask learning framework. Research. 2023; 6: 0153.

[88]

Yang Y, Jerger A, Feng S, Wang Z, Brasfield C, Cheung MS, et al. CLEAN-contact: contrastive learning-enabled enzyme functional annotation prediction with structural inference. bioRxiv. 2024. https://www.biorxiv.org/content/10.1101/2024.05.14.594148v1

[89]

Liang M, Nie J. Prediction of enzyme function based on a structure relation network. IEEE Access. 2020; 8: 132360-132366.

[90]

Song Y, Yuan Q, Chen S, Zeng Y, Zhao H, Yang Y. Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures. Nat Commun. 2024; 15: 8180.

[91]

Zheng L, Shi S, Lu M, Fang P, Pan Z, Zhang H, et al. AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding. Genome Biol. 2024; 25: 41.

[92]

Rahman SA, Cuesta SM, Furnham N, Holliday GL, Thornton JM. EC-BLAST: a tool to automatically search and compare enzyme reactions. Nat Methods. 2014; 11: 171-174.

[93]

Moriya Y, Yamada T, Okuda S, Nakagawa Z, Kotera M, Tokimatsu T, et al. Identification of enzyme genes using chemical structure alignments of substrate-product pairs. J Chem Inf Model. 2016; 56: 510-516.

[94]

Carbonell P, Wong J, Swainston N, Takano E, Turner NJ, Scrutton NS, et al. Selenzyme: enzyme selection tool for pathway design. Bioinformatics. 2018; 34: 2153-2154.

[95]

Plehiers PP, Marin GB, Stevens CV, Van Geem KM. Automated reaction database and reaction network analysis: extraction of reaction templates using cheminformatics. J Cheminform. 2018; 10: 11.

[96]

Probst D. An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification. J Cheminform. 2023; 15: 113.

[97]

Shi Z, Wang D, Li Y, Deng R, Lin J, Liu C, et al. REME: an integrated platform for reaction enzyme mining and evaluation. Nucleic Acids Res. 2024; 52: W299-W305.

[98]

Sivakumar TV, Bhaduri A, Muni RRD. EnzFIND: Method to identify enzymes for promiscuous biochemical reactions. 2022 IEEE 19th India Council International Conference (INDICON), Kochi, India. 2022; p. 1-6.

[99]

Martínez Cuesta S, Rahman SA, Thornton JM. Exploring the chemistry and evolution of the isomerases. Proc Natl Acad Sci USA. 2016; 113: 1796-1801.

[100]

Tyzack JD, Ribeiro AJM, Borkakoti N, Thornton JM. Transform-MinER: transforming molecules in enzyme reactions. Bioinformatics. 2018; 34: 3597-3599.

[101]

Giri V, Sivakumar TV, Cho KM, Kim TY, Bhaduri A. RxnSim: a tool to compare biochemical reactions. Bioinformatics. 2015; 31: 3712-3714.

[102]

Sivakumar TV, Bhaduri A, Duvvuru Muni RR, Park JH, Kim TY. SimCAL: a flexible tool to compute biochemical reaction similarity. BMC Bioinformatics. 2018; 19: 254.

[103]

Schwaller P, Probst D, Vaucher AC, Nair VH, Kreutter D, Laino T, et al. Mapping the space of chemical reactions using attention-based neural networks. Nat Mach Intell. 2021; 3: 144-152.

[104]

Probst D, Schwaller P, Reymond JL. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. Digit Discov. 2022; 1: 91-97.

[105]

Sun D, Cheng X, Tian Y, Ding S, Zhang D, Cai P, et al. EnzyMine: a comprehensive database for enzyme function annotation with enzymatic reaction chemical feature. Database. 2020; 2023: baaa065.

[106]

Chang A, Jeske L, Ulbrich S, Hofmann J, Koblitz J, Schomburg I, et al. BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res. 2021; 49: D498-D508.

[107]

Heid E, Probst D, Green WH, Madsen GKH. EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions. Chem Sci. 2023; 14: 14229-14242.

[108]

Pundir S, Onwubiko J, Zaru R, Rosanoff S, Antunes R, Bingley M, et al. An update on the Enzyme Portal: an integrative approach for exploring enzyme knowledge. Protein Eng Des Sel. 2017; 30: 247-254.

[109]

Armstrong DR, Berrisford JM, Conroy MJ, Gutmanas A, Anyango S, Choudhary P, et al. PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res. 2020; 48: 335.

[110]

Bansal P, Morgat A, Axelsen KB, Muthukrishnan V, Coudert E, Aimo L, et al. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res. 2022; 50: D693-D700.

[111]

Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 2022; 50: D687-D692.

[112]

Fleischmann A, Darsow M, Degtyarenko K, Fleischmann W, Boyce S, Axelsen KB et al. IntEnz, the integrated relational enzyme database. Nucleic Acids Res. 2004; 32: 434D-437D.

[113]

Degtyarenko K, De Matos P, Ennis M, Hastings J, Zbinden M, Mcnaught A, et al. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 2007; 36: D344-D350.

[114]

Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012; 40: D1100-D1107.

[115]

Do QT, Bernard P. Pharmacognosy and reverse pharmacognosy: a new concept for accelerating natural drug discovery. IDrugs. 2004; 7: 1017-1027.

[116]

Do Q-T, Renimel I, Andre P, Lugnier C, Muller C, Bernard P. Reverse pharmacognosy: application of selnergy, a new tool for lead discovery. the example of ε-Viniferin. Curr Drug Discovery Technol. 2005; 2: 161-167.

[117]

Do Q-T, Lamy C, Renimel I, Sauvan N, André P, Himbert F, et al. Reverse pharmacognosy: identifying biological properties for plants by means of their molecule constituents: application to meranzin. Planta Med. 2007; 73: 1235-1240.

[118]

Slon-Usakiewicz JJ, Pasternak A, Reid N, Toledo-Sherman LM. New targets for an old drug: II. Hypoxanthine-guanine amidophosphoribosyltransferase as a new pharmacodynamic target of methotrexate. Clin Proteomics. 2004; 1: 227-234.

[119]

Huang H, Zhang G, Zhou Y, Lin C, Chen S, Lin Y, et al. Reverse screening methods to search for the protein targets of chemopreventive compounds. Front Chem. 2018; 6: 138.

[120]

Ji K-Y, Liu C, Liu Z-Q, Deng Y-F, Hou T-J, Cao DS. Comprehensive assessment of nine target prediction web services: which should we choose for target fishing? Brief Bioinform. 2023; 24: bbad014.

[121]

Xu X, Huang M, Zou X. Docking-based inverse virtual screening: methods, applications, and challenges. Biophys Rep. 2018; 4: 1-16.

[122]

Agu PC, Afiukwa CA, Orji OU, Ezeh EM, Ofoke IH, Ogbu CO, et al. Molecular docking as a tool for the discovery of molecular targets of nutraceuticals in diseases management. Sci Rep. 2023; 13: 13398.

[123]

Zhou Z, Zhang L, Yu Y, Wu B, Li M, Hong L, et al. Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning. Nat Commun. 2024; 15: 5566.

[124]

Notin P, Kollasch A, Ritter D, Van Niekerk L, Paul S, Spinner H, et al. Proteingym: large-scale benchmarks for protein fitness prediction and design. NeurIPS. 2023; 36: 64331-79.

[125]

Madani A, Mccann B, Naik N, Keskar NS, Anand N, Eguchi RR, et al. Progen: language modeling for protein generation. arXiv. 2020. https://arxiv.org/abs/2004.03497

[126]

Munsamy G, Lindner S, Lorenz P, Ferruz N. ZymCTRL: a conditional language model for the controllable generation of artificial enzymes. NeurIPS. 2022. https://nips.cc/virtual/2022/59047

[127]

Jiang Y, Zhang G, You J, Zhang H, Yao R, Xie H, et al. Pocketflow is a data-and-knowledge-driven structure-based molecular generative model. Nat Mach Intell. 2024; 6: 326-337.

[128]

Singh R, Tiwari M, Singh R, Lee JK. From protein engineering to immobilization: promising strategies for the upgrade of industrial enzymes. Int J Mol Sci. 2013; 14: 1232-1277.

[129]

Ge F, Chen G, Qian M, Xu C, Liu J, Cao J, et al. Artificial intelligence aided lipase production and engineering for enzymatic performance improvement. J Agricult Food Chem. 2023; 71: 14911-14930.

[130]

Holzinger A, Keiblinger K, Holub P, Zatloukal K, Müller H. AI for life: trends in artificial intelligence for biotechnology. New Biotechnol. 2023; 74: 16-24.

[131]

Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev. 2024; 53: 8202-8239.

[132]

Nam K, Shao Y, Major DT, Wolf-Watz M. Perspectives on computational enzyme modeling: from mechanisms to design and drug development. ACS Omega. 2024; 9: 7393-7412.

[133]

Lovelock SL, Crawshaw R, Basler S, Levy C, Baker D, Hilvert D, et al. The road to fully programmable protein catalysis. Nature. 2022; 606: 49-58.

[134]

Vieille C, Zeikus GJ. Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability. Microbiol Mol Biol Rev. 2001; 65: 1-43.

[135]

Liao M, Feng S, Liu X, Xu G, Li S, Bai Y, et al. Novel insights into enzymatic thermostability: the “short board” theory and zero-shot Hamiltonian model. Adv Sci. 2024; 11: 2402441.

[136]

Kang L, Wu B, Zhou B, Tan P, Kang Y, Yan Y, et al. AI-enabled alkaline-resistant evolution of protein to apply in mass production. BioRxiv. 2024. https://www.biorxiv.org/content/10.1101/2024.09.04.611192v1

[137]

Tan Y, Wang R, Wu B, Hong L, Zhou BJaPA. Retrieval-enhanced mutation mastery: augmenting zero-shot prediction of protein language model. arXiv. 2024. https://arxiv.org/abs/2410.21127

[138]

Iyer PV, Ananthanarayan L. Enzyme stability and stabilization—aqueous and non-aqueous environment. Process Biochem. 2008; 43: 1019-1032.

[139]

Li M, Tang H, Qing R, Wang Y, Liu J, Wang R, et al. Design of a water-soluble transmembrane receptor kinase with intact molecular function by QTY code. Nat Commun. 2024; 15: 4293.

[140]

Yu H, Deng H, He J, Keasling JD, Luo X. UniKP: a unified framework for the prediction of enzyme kinetic parameters. Nat Commun. 2023; 14: 8211.

[141]

Zhou B, Zheng L, Wu B, Yi K, Zhong B, Tan Y, et al. A conditional protein diffusion model generates artificial programmable endonuclease sequences with enhanced activity. Cell Discov. 2024; 10: 95.

[142]

Song Z, Zhao Y, Shi W, Jin W, Yang Y, Li L. Generative enzyme design guided by functionally important sites and small-molecule substrates. arXiv. 2024. https://arxiv.org/abs/2405.08205

[143]

Karpov P, Godin G, Tetko IV. A transformer model for retrosynthesis. In: Tetko I, Kůrková V, Karpov P, Theis F, editors. Artificial neural networks and machine learning - ICANN 2019: workshop and special sessions. ICANN 2019. Lecture notes in computer science, Vol. 11731. Springer. 2019. p. 817-830.

[144]

Sridharan B, Goel M, Priyakumar UD. Modern machine learning for tackling inverse problems in chemistry: molecular design to realization. Chem Commun. 2022; 58: 5316-5331.

[145]

Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform. 2022; 23: bbab391.

[146]

Wang Z, Zhou F, Wang Z, Hu Q, Li Y-Q, Wang S, et al. Fully flexible molecular alignment enables accurate ligand structure modeling. J Chem Inf Model. 2024; 64: 6205-6215.

[147]

Hong S, Zhuo HH, Jin K, Shao G, Zhou Z. Retrosynthetic planning with experience-guided Monte Carlo tree search. Commun Chem. 2023; 6: 120.

[148]

Zhang Y, He X, Gao S, Zhou A, Hao H. Evolutionary retrosynthetic route planning [research frontier]. IEEE Comput Intell Mag. 2024; 19: 58-72.

[149]

Zhao D, Tu S, Xu L. Efficient retrosynthetic planning with MCTS exploration enhanced A* search. Commun Chem. 2024; 7: 52.

[150]

Cheng X, Ma L. Enzymatic synthesis of fluorinated compounds. Appl Microbiol Biotechnol. 2021; 105: 8033-8058.

[151]

Wang X, Lin X, Jiang Y, Qin X, Ma N, Yao F, et al. Engineering cytochrome P450BM3 enzymes for direct nitration of unsaturated hydrocarbons. Angew Chem. 2023; 135: e202217678.

[152]

Kissman EN, Sosa MB, Millar DC, Koleski EJ, Thevasundaram K, Chang MCY. Expanding chemistry through in vitro and in vivo biocatalysis. Nature. 2024; 631: 37-48.

[153]

Ferrer M, Martínez-Abarca F, Golyshin P. Mining genomes and ‘metagenomes’ for novel catalysts. Curr Opin Biotechnol. 2005; 16: 588-593.

[154]

Kuznetsova E, Proudfoot M, Sanders S, Reinking J, Savchenko A, Arrowsmith C, et al. Enzyme genomics: application of general enzymatic screens to discover new enzymes. FEMS Microbiol Rev. 2005; 29: 263-279.

[155]

Payne EM, Holland-Moritz DA, Sun S, Kennedy RT. High-throughput screening by droplet microfluidics: perspective into key challenges and future prospects. Lab Chip. 2020; 20: 2247-2262.

[156]

Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, et al. DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics. 2018; 34: 760-769.

[157]

De Bie C. Genzyme: 15 years of cell and gene therapy research. Regen Med. 2007; 2: 95-97.

[158]

Matthews ML, Chang W, Layne AP, Miles LA, Krebs C, Bollinger JM. Direct nitration and azidation of aliphatic carbons by an iron-dependent halogenase. Nat Chem Biol. 2014; 10: 209-215.

[159]

Neugebauer ME, Sumida KH, Pelton JG, Mcmurry JL, Marchand JA, Chang MCY. A family of radical halogenases for the engineering of amino-acid-based products. Nat Chem Biol. 2019; 15: 1009-1016.

[160]

Gomez CA, Mondal D, Du Q, Chan N, Lewis JC. Directed evolution of an iron (II)-and α-ketoglutarate-dependent dioxygenase for Site-Selective azidation of unactivated aliphatic C−H bonds. Angew Chem Int Ed. 2023; 135: e202301370.

[161]

Coelho PS, Brustad EM, Kannan A, Arnold FH. Olefin cyclopropanation via carbene transfer catalyzed by engineered cytochrome P450 enzymes. Science. 2013; 339: 307-310.

[162]

Dydio P, Key HM, Hayashi H, Clark DS, Hartwig JF. Chemoselective, enzymatic C-H bond amination catalyzed by a cytochrome P450 containing an IR (Me)-PIX cofactor. J Am Chem Soc. 2017; 139: 1750-1753.

[163]

Narayan ARH, Jiménez-Osés G, Liu P, Negretti S, Zhao W, Gilbert MM, et al. Enzymatic hydroxylation of an unactivated methylene C-H bond guided by molecular dynamics simulations. Nat Chem. 2015; 7: 653-660.

[164]

Gally C, Nestl BM, Hauer B. Engineering rieske Non-Heme iron oxygenases for the asymmetric dihydroxylation of alkenes. Angew Chem Int Ed. 2015; 54: 12952-12956.

[165]

Zhang K, El Damaty S, Fasan R. P450 fingerprinting method for rapid discovery of terpene hydroxylating P450 catalysts with diversified regioselectivity. J Am Chem Soc. 2011; 133: 3242-3245.

[166]

Li F, Deng H, Renata H. Remote B-ring oxidation of sclareol with an engineered P450 facilitates divergent access to complex terpenoids. J Am Chem Soc. 2022; 144: 7616-7621.

[167]

Over B, Wetzel S, Grütter C, Nakai Y, Renner S, Rauh D, et al. Natural-product-derived fragments for fragment-based ligand discovery. Nat Chem. 2013; 5: 21-28.

[168]

Grigalunas M, Burhop A, Zinken S, Pahl A, Gally J-M, Wild N, et al. Natural product fragment combination to performance-diverse pseudo-natural products. Nat Commun. 2021; 12: 1883.

[169]

Grigalunas M, Brakmann S, Waldmann H. Chemical evolution of natural product structure. J Am Chem Soc. 2022; 144: 3314-3329.

[170]

Liu W, Wang P, Zhuang X, Ling Y, Liu H, Wang S, et al. RDBSB: a database for catalytic bioparts with experimental evidence. Nucleic Acids Res. 2024; 53: D709-D716.

[171]

Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering. Nat Methods. 2019; 16: 687-694.

RIGHTS & PERMISSIONS

2025 The Author(s). mLife published by John Wiley & Sons Australia, Ltd on behalf of Institute of Microbiology, Chinese Academy of Sciences.

AI Summary AI Mindmap
PDF

13

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/