paint-brush
Software, Tools and Resources Utilized in TopLapGBT's Developmentby@mutation

Software, Tools and Resources Utilized in TopLapGBT's Development

by The Mutation PublicationFebruary 17th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Discover the software and resources crucial for developing TopLapGBT, from protein structure generation with AlphaFold 2 to machine learning with sklearn. Access code repositories and computational resources to delve into the intricacies of this groundbreaking model.

People Mentioned

Mention Thumbnail
featured image - Software, Tools and Resources Utilized in TopLapGBT's Development
The Mutation Publication HackerNoon profile picture

Authors:

(1) JunJie Wee, Department of Mathematics, Michigan State University;

(2) Jiahui Chen, Department of Mathematical Sciences, University of Arkansas;

(3) Kelin Xia, Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University & [email protected];

(4)Guo-Wei Wei, Department of Mathematics, Michigan State University, Department of Biochemistry and Molecular Biology, Michigan State University, Department of Electrical and Computer Engineering, Michigan State University & [email protected].

Abstract & Introduction

Results

Discussion

Conclusion

Materials and Methods

Software and resources, Code and Data Availability

Supporting Information, Acknowledgments & References

6 Software and resources

Protein sequences are first preprocessed by AlphaFold 2 to generate wild type protein structures. In particular, 3D protein structures are generated from protein sequences using ColabFold [52]. Mutant proteins are generated from the Jackal software[38]. All TopLapGBT models are built using the sklearn machine learning library [53]. The hyperparameters for all the TopLapGBT are: n estimators = 20000, learning rate = 0.05, max depth = 7, subsample=0.4, min sample split = 3 and max features = sqrt. The PQR files, which contains the partial charge information of the proteins, are generated from the PDB2PQR software [54]. The PQR files for both the wild type proteins are generated with AMBER force field. The solvation energy and surface area information are calculated from the in-house online software package ESES [55] and MIBPB [56]. The pKa values are computed from the PROPKA software package [57]. The position-specific-scoring matrices (PSSM) are computed from the BLAST+ software [58] using the nr database. The secondary structure features and torsion angle sequence-based information are calculated from SPIDER [59]. The persistent Laplacian descriptors for both VR complexes and alpha complexes are calculated using the GUDHI software library [60]. All computational work in support of this research was performed using the resources from the National Super Computing Centre of Singapore (NSCC).

Code and Data Availability

The 3D protein structures and the TopLapGBT code can be found in https://github.com/ExpectozJJ/TopLapGBT. The source code for the R-S plot can be found at https://github.com/hozumiyu/RSI.


This paper is available on arxiv under CC 4.0 license.