paint-brush
Predicting a Protein’s Stability under a Million Mutations: Conclusion, Acknowledgement & Referencesby@mutation
210 reads

Predicting a Protein’s Stability under a Million Mutations: Conclusion, Acknowledgement & References

tldt arrow

Too Long; Didn't Read

Protein engineering is the discipline of mutating a natural protein sequence to improve properties for industrial and pharmaceutical applications.
featured image - Predicting a Protein’s Stability
under a Million Mutations: Conclusion, Acknowledgement & References
Mutation Technology Publications HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Jeffrey Ouyang-Zhang, UT Austin

(2) Daniel J. Diaz, UT Austin

(3) Adam R. Klivans, UT Austin

(4) Philipp Krähenbühl, UT Austin

7 Conclusion

We present a method that efficiently scales thermodynamic stability prediction from single mutations to higher-order mutations. Our key insight is that the effects of mutations on the same protein are correlated. Thus, for a target protein, it suffices to run a deep backbone once and decode the effect of all mutations simultaneously using a shallow decoder. With the AlphaFold model as our backbone, our method outperforms existing methods on a variety of single and multiple mutation benchmarks. Our method scales to millions of mutations with minimal computational overhead and runs in a fraction of the time it would take prior works.


8 Acknowledgements

This work is supported by the NSF AI Institute for Foundations of Machine Learning (IFML).


References

[1] Randomization of genes by pcr mutagenesis. Genome research 2(1), 28–33 (1992) 2


[2] Adams, J.P., Brown, M.J., Diaz-Rodriguez, A., Lloyd, R.C., Roiban, G.D.: Biocatalysis: A Pharma Perspective. Advanced Synthesis and Catalysis 361(11), 2421–2432 (2019). https://doi.org/10.1002/adsc.201900424 1


[3] Ahdritz, G., Bouatta, N., Kadyan, S., Xia, Q., Gerecke, W., O’Donnell, T.J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., Nowaczynski, A., Wang, B., Stepniewska-Dziubinska, M.M., Zhang, S., Ojewole, A., Guney, M.E., Biderman, S., Watkins, A.M., Ra, S., Lorenzo, P.R., Nivon, L., Weitzner, B., Ban, Y.E.A., Sorger, P.K., Mostaque, E., Zhang, Z., Bonneau, R., AlQuraishi, M.: Openfold: Retraining alphafold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv (2022). https://doi.org/10.1101/2022.11.20.517210 6


[4] Arnold, F.H.: Design by directed evolution. Accounts of chemical research 31(3), 125–131 (1998) 1, 2, 3


[5] Bell, E.L., Finnigan, W., France, S.P., Green, A.P., Hayes, M.A., Hepworth, L.J., Lovelock, S.L., Niikura, H., Osuna, S., Romero, E., Ryan, K.S., Turner, N.J., Flitsch, S.L.: Biocatalysis. Nature Reviews Methods Primers 1(1), 1–21 (2021). https://doi.org/10.1038/s43586-021-00044-z 1


[6] Benevenuta, S., Pancotti, C., Fariselli, P., Birolo, G., Sanavia, T.: An antisymmetric neural network to predict free energy changes in protein variants. Journal of Physics D: Applied Physics 54(24), 245403


[7] Benevenuta, S., Pancotti, C., Fariselli, P., Birolo, G., Sanavia, T.: An antisymmetric neural network to predict free energy changes in protein variants. Journal of Physics D: Applied Physics 54(24), 245403 (2021) 8


[8] Benevenuta, S., Birolo, G., Sanavia, T., Capriotti, E., Fariselli, P.: Challenges in predicting stabilizing variations: An exploration. Frontiers in Molecular Biosciences 9, 1075570 (2023) 5, 6, 17


[9] Brückner, A., Polge, C., Lentze, N., Auerbach, D., Schlattner, U.: Yeast two-hybrid, a powerful tool for systems biology. International journal of molecular sciences 10(6), 2763–2788 (2009) 1


[10] Cadet, F., Saavedra, E., Syren, P.O., Gontero, B.: Machine learning, epistasis, and protein engineering: From sequence-structure-function relationships to regulation of metabolic pathways. Frontiers in Molecular Biosciences 9, 1098289 (2022) 1


[11] Capriotti, E., Fariselli, P., Casadio, R.: I-mutant2. 0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic acids research 33(suppl_2), W306–W310 (2005) 3, 8


[12] Chen, K., Arnold, F.H.: Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin e for catalysis in dimethylformamide. Proceedings of the National Academy of Sciences 90(12), 5618–5622 (1993) 1, 3


[13] Chen, T., Gong, C., Diaz, D.J., Chen, X., Wells, J.T., Wang, Z., Ellington, A., Dimakis, A., Klivans, A., et al.: Hotprotein: A novel framework for protein thermostability prediction and editing. In: The Eleventh International Conference on Learning Representations (2022) 3


[14] Chen, Y., Lu, H., Zhang, N., Zhu, Z., Wang, S., Li, M.: Premps: Predicting the impact of missense mutations on protein stability. PLoS computational biology 16(12), e1008543 (2020) 3, 8


[15] Cheng, J., Randall, A., Baldi, P.: Prediction of protein stability changes for single-site mutations using support vector machines. Proteins: Structure, Function, and Bioinformatics 62(4), 1125–1132 (2006) 3, 8


[16] Dehghanpoor, R., Ricks, E., Hursh, K., Gunderson, S., Farhoodi, R., Haspel, N., Hutchinson, B., Jagodzinski, F.: Predicting the effect of single and multiple mutations on protein structural stability. Molecules 23(2), 251 (2018) 4, 7


[17] Dehouck, Y., Kwasigroch, J.M., Gilis, D., Rooman, M.: Popmusic 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC bioinformatics 12(1), 1–12 (2011) 3, 8


[18] Diaz, D.J., Gong, C., Ouyang-Zhang, J., Loy, J.M., Wells, J.T., Yang, D., Ellington, A.J., Dimakis, A., Klivans, A.R.: Stability oracle: A structure-based graph-transformer for identifying stabilizing mutations (2023). https://doi.org/10.1101/2023.05.15.540857 3, 4, 7, 8, 16, 17, 18, 19


[19] Diaz, D.J., Kulikova, A.V., Ellington, A.D., Wilke, C.O.: Using machine learning to predict the effects and consequences of mutations in proteins. Current Opinion in Structural Biology 78, 102518 (2023) 3


[20] d’Oelsnitz, S., Diaz, D.J., Acosta, D.J., Schechter, M.W., Minus, M.B., Howard, J.R., Do, H., Loy, J., Alper, H., Ellington, A.D.: Synthetic microbial sensing and biosynthesis of amaryllidaceae alkaloids. bioRxiv pp. 2023–04 (2023) 3


[21] d’Oelsnitz, S., Kim, W., Burkholder, N.T., Javanmardi, K., Thyer, R., Zhang, Y., Alper, H.S., Ellington, A.D.: Using fungible biosensors to evolve improved alkaloid biosyntheses. Nature Chemical Biology 18(9), 981–989 (2022) 1


[22] Ellefson, J.W., Gollihar, J., Shroff, R., Shivram, H., Iyer, V.R., Ellington, A.D.: Synthetic evolutionary origin of a proofreading reverse transcriptase. Science 352(6293), 1590–1593 (2016) 1


[23] Feng, X., Sanchis, J., Reetz, M.T., Rabitz, H.: Enhancing the efficiency of directed evolution in focused enzyme libraries by the adaptive substituent reordering algorithm. Chemistry–A European Journal 18(18), 5646–5654 (2012) 3


[24] Frazer, J., Notin, P., Dias, M., Gomez, A., Min, J.K., Brock, K., Gal, Y., Marks, D.S.: Disease variant prediction with deep generative models of evolutionary data. Nature 599(7883), 91–95 (2021) 9


[25] Gebauer, M., Skerra, A.: Engineered protein scaffolds as next-generation therapeutics. Annual Review of Pharmacology and Toxicology 60, 391–415 (2020). https://doi.org/10.1146/annurev-pharmtox-010818- 021118 1


[26] Gerasimavicius, L., Livesey, B.J., Marsh, J.A.: Correspondence between functional scores from deep mutational scans and predicted effects on protein stability. bioRxiv pp. 2023–02 (2023) 9


[27] Giver, L., Gershenson, A., Freskgard, P.O., Arnold, F.H.: Directed evolution of a thermostable esterase. Proceedings of the National Academy of Sciences 95(22), 12809–12813 (1998) 2, 3


[28] Hie, B.L., Shanker, V.R., Xu, D., Bruun, T.U., Weidenbacher, P.A., Tang, S., Wu, W., Pak, J.E., Kim, P.S.: Efficient evolution of human antibodies from general protein language models. Nature Biotechnology (2023) 3


[29] Jaroszewicz, W., Morcinek-Orłowska, J., Pierzynowska, K., Gaffke, L., W˛egrzyn, G.: Phage display and other peptide display technologies. FEMS Microbiology Reviews 46(2), fuab052 (2022) 1


[30] Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS) 20(4), 422–446 (2002) 6, 17


[31] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., et al.: Highly accurate protein structure prediction with alphafold. Nature 596(7873), 583–589 (2021) 2, 3, 4, 6, 9, 10


[32] Kellogg, E.H., Leaver-Fay, A., Baker, D.: Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins: Structure, Function, and Bioinformatics 79(3), 830–838 (2011) 3


[33] Klabunde, T., Petrassi, H.M., Oza, V.B., Raman, P., Kelly, J.W., Sacchettini, J.C.: Rational design of potent human transthyretin amyloid disease inhibitors. Nature structural biology 7(4), 312–321 (2000) 1, 2


[34] Kouba, P., Kohout, P., Haddadi, F., Bushuiev, A., Samusevich, R., Sedlar, J., Damborsky, J., Pluskal, T., Sivic, J., Mazurenko, S.: Machine learning-guided protein engineering. ACS Catalysis 13, 13863–13895 (2023) 3


[35] Laimer, J., Hofer, H., Fritz, M., Wegenkittl, S., Lackner, P.: Maestro-multi agent stability prediction upon point mutations. BMC bioinformatics 16(1), 1–13 (2015) 3, 4, 7, 8


[36] Li, B., Yang, Y.T., Capra, J.A., Gerstein, M.B.: Predicting changes in protein thermodynamic stability upon point mutation with deep 3d convolutional neural networks. PLoS computational biology 16(11), e1008291 (2020) 3, 4, 8


[37] Li, G., Panday, S.K., Alexov, E.: Saafec-seq: a sequence-based method for predicting the effect of single point mutations on protein thermodynamic stability. International journal of molecular sciences 22(2), 606 (2021) 3


[38] Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., et al.: Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv (2022) 3, 4, 6, 9, 10, 18


[39] Loell, K., Nanda, V.: Marginal protein stability drives subcellular proteome isoelectric point. Proceedings of the National Academy of Sciences 115(46), 11778–11783 (2018) 1


[40] Lu, H., Diaz, D.J., Czarnecki, N.J., Zhu, C., Kim, W., Shroff, R., Acosta, D.J., Alexander, B.R., Cole, H.O., Zhang, Y., et al.: Machine learning-aided engineering of hydrolases for pet depolymerization. Nature 604(7907), 662–667 (2022) 3


[41] Meghwanshi, G.K., Kaur, N., Verma, S., Dabi, N.K., Vashishtha, A., Charan, P.D., Purohit, P., Bhandari, H.S., Bhojak, N., Kumar, R.: Enzymes for pharmaceutical and therapeutic applications. Biotechnology and Applied Biochemistry 67(4), 586–601 (2020). https://doi.org/10.1002/bab.1919 1


[42] Meier, J., Rao, R., Verkuil, R., Liu, J., Sercu, T., Rives, A.: Language models enable zero-shot prediction of the effects of mutations on protein function. Advances in Neural Information Processing Systems 34, 29287–29303 (2021) 9, 18


[43] Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., Steinegger, M.: Colabfold: making protein folding accessible to all. Nature methods 19(6), 679–682 (2022) 6


[44] Montanucci, L., Capriotti, E., Frank, Y., Ben-Tal, N., Fariselli, P.: Ddgun: an untrained method for the prediction of protein stability changes upon single and multiple point variations. BMC bioinformatics 20, 1–10 (2019) 3, 4, 7, 8, 16, 18


[45] Montanucci, L., Savojardo, C., Martelli, P.L., Casadio, R., Fariselli, P.: On the biases in predictions of protein stability changes upon variations: the inps test case. Bioinformatics 35(14), 2525–2527 (2019) 16


[46] Musdal, Y., Govindarajan, S., Mannervik, B.: Exploring sequence-function space of a poplar glutathione transferase using designed information-rich gene variants. Protein Engineering, Design and Selection 30(8), 543–549 (2017) 3


[47] Nijkamp, E., Ruffolo, J., Weinstein, E.N., Naik, N., Madani, A.: Progen2: exploring the boundaries of protein language models. arXiv preprint arXiv:2206.13517 (2022) 9


[48] Nikam, R., Kulandaisamy, A., Harini, K., Sharma, D., Gromiha, M.M.: Prothermdb: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic acids research 49(D1), D420–D424 (2021) 2, 16


[49] Notin, P., Dias, M., Frazer, J., Hurtado, J.M., Gomez, A.N., Marks, D., Gal, Y.: Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In: International Conference on Machine Learning. pp. 16990–17017. PMLR (2022) 2, 3, 9, 16, 18


[50] Notin, P.M., Van Niekerk, L., Kollasch, A.W., Ritter, D., Gal, Y., Marks, D.: Trancepteve: Combining family-specific and family-agnostic models of protein sequences for improved fitness prediction. bioRxiv pp. 2022–12 (2022) 9


[51] Nutschel, C., Fulton, A., Zimmermann, O., Schwaneberg, U., Jaeger, K.E., Gohlke, H.: Systematically scrutinizing the impact of substitution sites on thermostability and detergent tolerance for bacillus subtilis lipase a. Journal of chemical information and modeling 60(3), 1568–1584 (2020) 9


[52] Paik, I., Ngo, P.H., Shroff, R., Diaz, D.J., Maranhao, A.C., Walker, D.J., Bhadra, S., Ellington, A.D.: Improved bst dna polymerase variants derived via a machine learning approach. Biochemistry (2021) 3


[53] Pak, M.A., Markhieva, K.A., Novikova, M.S., Petrov, D.S., Vorobyev, I.S., Maksimova, E.S., Kondrashov, F.A., Ivankov, D.N.: Using alphafold to predict the impact of single mutations on protein stability and function. Plos one 18(3), e0282689 (2023) 3


[54] Pancotti, C., Benevenuta, S., Birolo, G., Alberini, V., Repetto, V., Sanavia, T., Capriotti, E., Fariselli, P.: Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Briefings in Bioinformatics 23(2), bbab555 (2022) 2, 8, 16, 17


[55] Pancotti, C., Benevenuta, S., Repetto, V., Birolo, G., Capriotti, E., Sanavia, T., Fariselli, P.: A deep-learning sequence-based method to predict protein stability changes upon genetic variations. Genes 12(6), 911 (2021) 8


[56] Pires, D.E., Ascher, D.B., Blundell, T.L.: Duet: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic acids research 42(W1), W314–W319 (2014) 3, 8


[57] Pires, D.E., Ascher, D.B., Blundell, T.L.: mcsm: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 30(3), 335–342 (2014) 8


[58] Pucci, F., Schwersensky, M., Rooman, M.: Artificial intelligence challenges for predicting the impact of mutations on protein stability. Current opinion in structural biology 72, 161–168 (2022) 17


[59] Qiu, Y., Wei, G.W.: Persistent spectral theory-guided protein engineering. Nature Computational Science pp. 1–15 (2023) 3, 6, 17


[60] Rao, R.M., Liu, J., Verkuil, R., Meier, J., Canny, J., Abbeel, P., Sercu, T., Rives, A.: Msa transformer. In: International Conference on Machine Learning. pp. 8844–8856. PMLR (2021) 3, 9, 10


[61] Riesselman, A.J., Ingraham, J.B., Marks, D.S.: Deep generative models of genetic variation capture the effects of mutations. Nature methods 15(10), 816–822 (2018) 9


[62] Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C.L., Ma, J., Fergus, R.: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS (2019). https://doi.org/10.1101/622803 3, 7


[63] Rodrigues, C.H., Pires, D.E., Ascher, D.B.: Dynamut: predicting the impact of mutations on protein conformation, flexibility and stability. Nucleic acids research 46(W1), W350–W355 (2018) 3, 8


[64] Rodrigues, C.H., Pires, D.E., Ascher, D.B.: Dynamut2: Assessing changes in stability and flexibility upon single and multiple point missense mutations. Protein Science 30(1), 60–69 (2021) 3, 4, 7


[65] Sanavia, T., Birolo, G., Montanucci, L., Turina, P., Capriotti, E., Fariselli, P.: Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Computational and Structural Biotechnology Journal 18, 1968–1979 (2020). https://doi.org/10.1016/j.csbj.2020.07.011, https://doi.org/10.1016/j.csbj.2020.07.011 6, 17


[66] Savojardo, C., Fariselli, P., Martelli, P.L., Casadio, R.: Inps-md: a web server to predict stability of protein variants from sequence and structure. Bioinformatics 32(16), 2542–2544 (2016) 3, 8


[67] Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F., Serrano, L.: The foldx web server: an online force field. Nucleic acids research 33(suppl_2), W382–W388 (2005) 3, 7, 8, 18


[68] Shroff, R., Cole, A.W., Diaz, D.J., Morrow, B.R., Donnell, I., Annapareddy, A., Gollihar, J., Ellington, A.D., Thyer, R.: Discovery of novel gain-of-function mutations guided by structure-based deep learning. ACS synthetic biology 9(11), 2927–2935 (2020) 3


[69] Starr, T.N., Thornton, J.W.: Epistasis in protein evolution. Protein science 25(7), 1204–1218 (2016) 1


[70] Steinegger, M., Soding, J.: Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology 35(11), 1026–1028 (2017) 6


[71] Stemmer, W.P.: Rapid evolution of a protein in vitro by dna shuffling. Nature 370(6488), 389–391 (1994) 2, 3


[72] Tsuboyama, K., Dauparas, J., Chen, J., Laine, E., Mohseni Behbahani, Y., Weinstein, J.J., Mangan, N.M., Ovchinnikov, S., Rocklin, G.J.: Mega-scale experimental analysis of protein folding stability in biology and protein design. bioRxiv pp. 2022–12 (2022) 1, 2, 6, 16


[73] Tung, J.W., Heydari, K., Tirouvanziam, R., Sahaf, B., Parks, D.R., Herzenberg, L.A., Herzenberg, L.A.: Modern flow cytometry: a practical approach. Clinics in laboratory medicine 27(3), 453–468 (2007) 1


[74] Umerenkov, D., Shashkova, T.I., Strashnov, P.V., Nikolaev, F., Sindeeva, M., Ivanisenko, N.V., Kardymon, O.L.: Prostata: Protein stability assessment using transformers. bioRxiv pp. 2022–12 (2022) 3, 4, 6, 7, 8, 18, 19


[75] Wittmann, B.J., Johnston, K.E., Wu, Z., Arnold, F.H.: Advances in machine learning for directed evolution. Current opinion in structural biology 69, 11–18 (2021) 3


[76] Wittmann, B.J., Yue, Y., Arnold, F.H.: Informed training set design enables efficient machine learningassisted directed protein evolution. Cell Systems 12(11), 1026–1045 (2021) 3


[77] Worth, C.L., Preissner, R., Blundell, T.L.: Sdm—a server for predicting effects of mutations on protein stability and malfunction. Nucleic acids research 39(suppl_2), W215–W222 (2011) 3, 8


[78] Wu, S., Snajdrova, R., Moore, J.C., Baldenius, K., Bornscheuer, U.T.: Biocatalysis: Enzymatic Synthesis for Industrial Applications. Angewandte Chemie - International Edition 60(1), 88–119 (2021). https://doi.org/10.1002/anie.202006648 1


[79] Yeung, N., Lin, Y.W., Gao, Y.G., Zhao, X., Russell, B.S., Lei, L., Miner, K.D., Robinson, H., Lu, Y.: Rational design of a structural and functional nitric oxide reductase. Nature 462(7276), 1079–1082 (2009) 1, 2


[80] Zhou, B., Lv, O., Yi, K., Xiong, X., Tan, P., Hong, L., Wang, Y.G.: Accurate and definite mutational effect prediction with lightweight equivariant graph neural networks. arXiv preprint arXiv:2304.08299 (2023) 3


[81] Zhou, Y., Pan, Q., Pires, D.E., Rodrigues, C.H., Ascher, D.B.: Ddmut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Research p. gkad472 (2023) 3