Optimizing enzyme thermostability by combining multiple mutations using protein language model

Jiahao Bian , Pan Tan , Ting Nie , Liang Hong , Guang-Yu Yang

mLife ›› 2024, Vol. 3 ›› Issue (4) : 492 -504.

PDF
mLife ›› 2024, Vol. 3 ›› Issue (4) : 492 -504. DOI: 10.1002/mlf2.12151
ORIGINAL RESEARCH

Optimizing enzyme thermostability by combining multiple mutations using protein language model

Author information +
History +
PDF

Abstract

Optimizing enzyme thermostability is essential for advancements in protein science and industrial applications. Currently, (semi-)rational design and random mutagenesis methods can accurately identify single-point mutations that enhance enzyme thermostability. However, complex epistatic interactions often arise when multiple mutation sites are combined, leading to the complete inactivation of combinatorial mutants. As a result, constructing an optimized enzyme often requires repeated rounds of design to incrementally incorporate single mutation sites, which is highly time-consuming. In this study, we developed an AI-aided strategy for enzyme thermostability engineering that efficiently facilitates the recombination of beneficial single-point mutations. We utilized thermostability data from creatinase, including 18 single-point mutants, 22 double-point mutants, 21 triple-point mutants, and 12 quadruple-point mutants. Using these data as inputs, we used a temperature-guided protein language model, Pro-PRIME, to learn epistatic features and design combinatorial mutants. After two rounds of design, we obtained 50 combinatorial mutants with superior thermostability, achieving a success rate of 100%. The best mutant, 13M4, contained 13 mutation sites and maintained nearly full catalytic activity compared to the wild-type. It showed a 10.19°C increase in the melting temperature and an ∼655-fold increase in the half-life at 58°C. Additionally, the model successfully captured epistasis in high-order combinatorial mutants, including sign epistasis (K351E) and synergistic epistasis (D17V/I149V). We elucidated the mechanism of long-range epistasis in detail using a dynamics cross-correlation matrix method. Our work provides an efficient framework for designing enzyme thermostability and studying high-order epistatic effects in protein-directed evolution.

Keywords

combinatorial mutants / creatinase / epistasis / protein language model / thermostability

Cite this article

Download citation ▾
Jiahao Bian, Pan Tan, Ting Nie, Liang Hong, Guang-Yu Yang. Optimizing enzyme thermostability by combining multiple mutations using protein language model. mLife, 2024, 3(4): 492-504 DOI:10.1002/mlf2.12151

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Bommarius AS, Paye MF. Stabilizing biocatalysts. Chem Soc Rev. 2013;42:6534–6565.

[2]

Bell EL, Smithson R, Kilbride S, Foster J, Hardy FJ, Ramachandran S, et al. Directed evolution of an efficient and thermostable PET depolymerase. Nat Catal. 2022;5:673–681.

[3]

Liu Q, Xun G, Feng Y. The state-of-the-art strategies of protein engineering for enzyme stabilization. Biotech Adv. 2019;37:530–537.

[4]

Nezhad NG, Rahman RNZRA, Normi YM, Oslan SN, Shariff FM, Leow TC. Thermostability engineering of industrial enzymes through structure modification. Appl Microbiol Biotechnol. 2022;106:4845–4866.

[5]

Bornscheuer UT, Huisman GW, Kazlauskas RJ, Lutz S, Moore JC, Robins K. Engineering the third wave of biocatalysis. Nature. 2012;485:185–194.

[6]

Sheldon RA, Pereira PC. Biocatalysis engineering: the big picture. Chem Soc Rev. 2017;46:2678–2691.

[7]

Yang H, Liu L, Li J, Chen J, Du G. Rational design to improve protein thermostability: recent advances and prospects. ChemBioEng Rev. 2015;2:87–94.

[8]

Bhuiya MW, Liu CJ. Engineering monolignol 4-O-methyltransferases to modulate lignin biosynthesis. J Biol Chem. 2010;285:277–285.

[9]

Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering. Nat Methods. 2019;16:687–694.

[10]

Pierce NA, Winfree E. Protein design is NP-hard. Protein Eng Des Sel. 2002;15:779–782.

[11]

Chen L, Zhang Z, Li Z, Li R, Huo R, Chen L, et al. Learning protein fitness landscapes with deep mutational scanning data from multiple sources. Cell Syst. 2023;14:706–721.

[12]

Bu Y, Cui Y, Peng Y, Hu M, Tian Y, Tao Y, et al. Engineering improved thermostability of the GH11 xylanase from Neocallimastix patriciarum via computational library design. Appl Microbiol Biotechnol. 2018;102:3675–3685.

[13]

Mu Q, Cui Y, Tian Y, Hu M, Tao Y, Wu B. Thermostability improvement of the glucose oxidase from Aspergillus niger for efficient gluconic acid production via computational design. Int J Biiol Macromol. 2019;136:1060–1068.

[14]

Cui Y, Chen Y, Liu X, Dong S, Tian Y, Qiao Y, et al. Computational redesign of a PETase for plastic biodegradation under ambient condition by the GRAPE strategy. ACS Catal. 2021;11:1340–1350.

[15]

de Visser JAGM, Elena SF, Fragata I, Matuszewski S. The utility of fitness landscapes and big data for predicting evolution. Heredity. 2018;121:401–405.

[16]

Hartman EC, Tullman-Ercek D. Learning from protein fitness landscapes: a review of mutability, epistasis, and evolution. Curr Opin Syst Biol. 2019;14:25–31.

[17]

Buda K, Miton CM, Tokuriki N. Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolution. Nat Commun. 2023;14:8508.

[18]

Wittmund M, Cadet F, Davari MD. Learning epistasis and residue coevolution patterns: current trends and future perspectives for advancing enzyme engineering. ACS Catal. 2022;12:14243–14263.

[19]

Starr TN, Thornton JW. Epistasis in protein evolution. Prot Sci. 2016;25:1204–1218.

[20]

Judge A, Sankaran B, Hu L, Palaniappan M, Birgy A, Prasad BVV, et al. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning. Proc Natl Acad Sci USA. 2024;121:e2313513121.

[21]

Miton CM, Tokuriki N. How mutational epistasis impairs predictability in protein evolution and design. Prot Sci. 2016;25:1260–1272.

[22]

Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci USA. 2021;118:e2016239118.

[23]

Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379:1123–1130.

[24]

Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, et al. Prottrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell. 2022;44:7112–7127.

[25]

Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, Holton JM, et al. Large language models generate functional protein sequences across diverse families. Nat Biotechnol. 2023;41:1099–1106.

[26]

Chen B, Cheng X, Geng Y-a, Li S, Zeng X, Wang B, et al. xTrimoPGLM: unified 100B-Scale pre-trained transformer for deciphering the language of protein. ArXiv. 2024. https://doi.org/10.48550/arXiv.2401.06199

[27]

Shin JE, Riesselman AJ, Kollasch AW, McMahon C, Simon E, Sander C, et al. Protein design and variant prediction using autoregressive generative models. Nat Commun. 2021;12:2403.

[28]

Consortium TU. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2022;51:D523–D531.

[29]

Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49:D412–D419.

[30]

Luo Y, Jiang G, Yu T, Liu Y, Vo L, Ding H, et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun. 2021;12:5743.

[31]

Blaabjerg LM, Kassem MM, Good LL, Jonsson N, Cagiada M, Johansson KE, et al. Rapid protein stability prediction using deep learning representations. eLife. 2023;12:e82593.

[32]

Qu Y, Niu Z, Ding Q, Zhao T, Kong T, Bai B, et al. Ensemble learning with supervised methods based on Large-Scale protein language models for protein mutation effects prediction. Int J Mol Sci. 2023;24:16496.

[33]

Tan P, Li M, Dong J, Yu Y, Sun X, Wu B, et al. A general temperature-guided language model to design proteins of enhanced stability and activity. Sci Adv. 2024;10:eadr2641.

[34]

Zhi Q, Kong P, Zang J, Cui Y, Li S, Li P, et al. Biochemical and molecular characterization of a novel high activity creatine amidinohydrolase from Arthrobacter nicotianae strain 02181. Process Biochem. 2009;44:460–465.

[35]

Bai X, Li D, Ma F, Deng X, Luo M, Feng Y, et al. Improved thermostability of creatinase from Alcaligenes faecalis through non-biased phylogenetic consensus-guided mutagenesis. Microb Cell Fact. 2020;19:194.

[36]

Vanella R, Küng C, Schoepfer AA, Doffini V, Ren J, Nash MA. Understanding activity-stability tradeoffs in biocatalysts by enzyme proximity sequencing. Nat Commun. 2024;15:1807.

[37]

Vernet E, Popa G, Pozdnyakova I, Rasmussen JE, Grohganz H, Giehm L, et al. Large-scale biophysical evaluation of protein PEGylation effects: in vitro properties of 61 protein entities. Mol Pharmaceutics. 2016;13:1587–1598.

[38]

Liu Z, Lemmonds S, Huang J, Tyagi M, Hong L, Jain N. Entropic contribution to enhanced thermal stability in the thermostable P450 CYP119. Proc Natl Acad Sci USA. 2018;115:E10049–E10058.

[39]

Jiang F, Bian J, Liu H, Li S, Bai X, Zheng L, et al. Creatinase: using increased entropy to improve the activity and thermostability. J Phys Chem B. 2023;127:2671–2682.

[40]

Zhang Q, Pan B, Yang P, Tian J, Zhou S, Xu X, et al. Engineering of methionine sulfoxide reductase A with simultaneously improved stability and activity for kinetic resolution of chiral sulfoxides. Int J Biiol Macromol. 2024;260:129540.

[41]

Zhang X, Li W, Pan L, Yang L, Li H, Ji F, et al. Improving the thermostability of alginate lyase FlAlyA with high expression by computer-aided rational design for industrial preparation of alginate oligosaccharides. Front Bioeng Biotechnol. 2022;10:1011273.

[42]

Farhat-Khemakhem A, Ali MB, Boukhris I, Khemakhem B, Maguin E, Bejar S, et al. Crucial role of pro 257 in the thermostability of Bacillus phytases: biochemical and structural investigation. Int J Biiol Macromol. 2013;54:9–15.

[43]

Phillips PC. Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008;9:855–867.

[44]

Nishikawa KK, Hoppe N, Smith R, Bingman C, Raman S. Epistasis shapes the fitness landscape of an allosteric specificity switch. Nat Commun. 2021;12:5562.

[45]

Wu Q, Zhang C, Dong W, Lu H, Yang Y, Li W, et al. Enhanced thermostability of xylanase XynA via computationally designed assembly of multiple n-terminal disulfide bridges. Process Biochem. 2024;138:67–78.

[46]

Qu G, Li A, Acevedo-Rocha CG, Sun Z, Reetz MT. The crucial role of methodology development in directed evolution of selective enzymes. Angew Chem Int Ed. 2020;59:13204–13231.

[47]

Reetz MT, Qu G, Sun Z. Engineered enzymes for the synthesis of pharmaceuticals and other high-value products. Nat Synthesis. 2024;3:19–32.

[48]

Fasan R, Jennifer Kan SB, Zhao H. A continuing career in biocatalysis: frances H. Arnold. ACS Catal. 2019;9:9775–9788.

[49]

Bedbrook CN, Rice AJ, Yang KK, Ding X, Chen S, LeProust EM, et al. Structure-guided SCHEMA recombination generates diverse chimeric channelrhodopsins. Proc Natl Acad Sci USA. 2017;114:E2624–E2633.

[50]

Davids T, Schmidt M, Böttcher D, Bornscheuer UT. Strategies for the discovery and engineering of enzymes for biocatalysis. Curr Opin Chem Biol. 2013;17:215–220.

[51]

Cui H, Cao H, Cai H, Jaeger K-E, Davari MD, Schwaneberg U. Computer-assisted recombination (CompassR) teaches us how to recombine beneficial substitutions from directed evolution campaigns. Chem A Eur J. 2020;26:643–649.

[52]

Wittmann BJ, Yue Y, Arnold FH. Informed training set design enables efficient machine learning-assisted directed protein evolution. Cell Syst. 2021;12:1026–1045.

[53]

Yu H, Dalby PA. Coupled molecular dynamics mediate long-and short-range epistasis between mutations that affect stability and aggregation kinetics. Proc Natl Acad Sci USA. 2018;115:E11043–E11052.

[54]

Yu H, Dalby PA. Exploiting correlated molecular-dynamics networks to counteract enzyme activity–stability trade-off. Proc Natl Acad Sci USA. 2018;115:E12192–E12200.

[55]

Bi J, Chen S, Zhao X, Nie Y, Xu Y. Computation-aided engineering of starch-debranching pullulanase from Bacillus thermoleovorans for enhanced thermostability. Appl Microbiol Biotechnol. 2020;104:7551–7562.

[56]

Lu H, Diaz DJ, Czarnecki NJ, Zhu C, Kim W, Shroff R, et al. Machine learning-aided engineering of hydrolases for PET depolymerization. Nature. 2022;604:662–667.

[57]

Li M, Kang L, Xiong Y, Wang YG, Fan G, Tan P, et al. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering. J Cheminf. 2023;15:12.

[58]

Jing B, Erives E, Pao-Huang P, Corso G, Berger B, Jaakkola T. EigenFold: generative protein structure prediction with diffusion models. ArXiv. 2023. https://doi.org/10.48550/arXiv.2304.021

[59]

Janson G, Valdes-Garcia G, Heo L, Feig M. Direct generation of protein conformational ensembles via machine learning. Nat Commun. 2023;14:774.

[60]

Chen Q, Chuai G, Zhang H, Tang J, Duan L, Guan H, et al. Genome-wide CRISPR off-target prediction and optimization using RNA-DNA interaction fingerprints. Nat Commun. 2023;14:7521.

[61]

Boivin S, Kozak S, Meijers R. Optimization of protein purification and characterization using Thermofluor screens. Protein Expr Purif. 2013;91:192–206.

[62]

Wright TA, Stewart JM, Page RC, Konkolewicz D. Extraction of thermodynamic parameters of protein unfolding using parallelized differential scanning fluorimetry. J Phys Chem Lett. 2017;8:553–558.

[63]

Yu H, Dalby PA. Chapter Two - A beginner’s guide to molecular dynamics simulations and the identification of cross-correlation networks for enzyme engineering. In: Tawfik DS, editor. Methods Enzymol. 643: Academic Press; 2020. 15–49.

[64]

Skjærven L, Yao XQ, Scarabelli G, Grant BJ. Integrating protein structural dynamics and evolutionary analysis with Bio3D. BMC Bioinformatics. 2014;15:399.

[65]

Humphrey W, Dalke A, Schulten K. VMD: visual molecular dynamics. J Mol Graphics. 1996;14:33–38.

RIGHTS & PERMISSIONS

2024 The Author(s). mLife published by John Wiley & Sons Australia, Ltd on behalf of Institute of Microbiology, Chinese Academy of Sciences.

AI Summary AI Mindmap
PDF

346

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/