Authors:
(1) JunJie Wee, Department of Mathematics, Michigan State University;
(2) Jiahui Chen, Department of Mathematical Sciences, University of Arkansas;
(3) Kelin Xia, Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University & [email protected];
(4)Guo-Wei Wei, Department of Mathematics, Michigan State University, Department of Biochemistry and Molecular Biology, Michigan State University, Department of Electrical and Computer Engineering, Michigan State University & [email protected].
Software and resources, Code and Data Availability
Supporting Information, Acknowledgments & References
Protein sequences are first preprocessed by AlphaFold 2 to generate wild type protein structures. In particular, 3D protein structures are generated from protein sequences using ColabFold [52]. Mutant proteins are generated from the Jackal software[38]. All TopLapGBT models are built using the sklearn machine learning library [53]. The hyperparameters for all the TopLapGBT are: n estimators = 20000, learning rate = 0.05, max depth = 7, subsample=0.4, min sample split = 3 and max features = sqrt. The PQR files, which contains the partial charge information of the proteins, are generated from the PDB2PQR software [54]. The PQR files for both the wild type proteins are generated with AMBER force field. The solvation energy and surface area information are calculated from the in-house online software package ESES [55] and MIBPB [56]. The pKa values are computed from the PROPKA software package [57]. The position-specific-scoring matrices (PSSM) are computed from the BLAST+ software [58] using the nr database. The secondary structure features and torsion angle sequence-based information are calculated from SPIDER [59]. The persistent Laplacian descriptors for both VR complexes and alpha complexes are calculated using the GUDHI software library [60]. All computational work in support of this research was performed using the resources from the National Super Computing Centre of Singapore (NSCC).
The 3D protein structures and the TopLapGBT code can be found in https://github.com/ExpectozJJ/TopLapGBT. The source code for the R-S plot can be found at https://github.com/hozumiyu/RSI.
This paper is available on arxiv under CC 4.0 license.