TopLapGBT: Bridging Gaps in Protein Solubility Prediction with Cutting-Edge Fusionby@mutation

TopLapGBT: Bridging Gaps in Protein Solubility Prediction with Cutting-Edge Fusion

by The Mutation PublicationFebruary 17th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

TopLapGBT's conclusion heralds a new era in protein solubility prediction, surpassing existing methods with its innovative fusion of geometric and topological features with machine learning. With superior performance metrics and transformative potential, TopLapGBT paves the way for advancements in molecular biology research.

People Mentioned

Mention Thumbnail
featured image - TopLapGBT: Bridging Gaps in Protein Solubility Prediction with Cutting-Edge Fusion
The Mutation Publication HackerNoon profile picture


(1) JunJie Wee, Department of Mathematics, Michigan State University;

(2) Jiahui Chen, Department of Mathematical Sciences, University of Arkansas;

(3) Kelin Xia, Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University & [email protected];

(4)Guo-Wei Wei, 1Department of Mathematics, Michigan State University, Department of Biochemistry and Molecular Biology, Michigan State University, Department of Electrical and Computer Engineering, Michigan State University & [email protected].

Abstract & Introduction




Materials and Methods

Software and resources, Code and Data Availability

Supporting Information, Acknowledgments & References

4 Conclusion

In the multifaceted quest to understand mutation-induced solubility changes, various scientific domains including quantum mechanics, molecular mechanics, biochemistry, biophysics, and molecular biology have made significant contributions. Despite these concerted efforts, state of-art models have limitations, as evidenced by their normalized CPR value of 0.656 even after employing feature selection methods. Persistent homology (PH) has emerged as a powerful tool for capturing the complexity of biomolecular structures and has achieved noteworthy success in drug discovery applications. However, its inability to capture the nuances of homotopic shape evolution, crucial for delineating molecular interactions in proteins, presents a critical shortcoming.

Our study introduces TopLapGBT, a novel model that integrates persistent Laplacian (PL) features with pretrained transformer features, thereby bridging the gap in capturing both topology and homotopic shape evolution. This innovative fusion leads to significant advancements in classification performance. Specifically, TopLapGBT achieves normalized CPR and GC2 scores of 0.688 and 0.361, respectively, marking improvements of 4.88% and 15.71% over the state-of-the-art PON-Sol2. These findings are further corroborated by an independent blind test, where TopLapGBT continues to outperform existing models.

In summary, our proposed TopLapGBT model not only achieves superior performance over existing state-of-the-art methods but also introduces a more nuanced approach for the classification of protein solubility changes upon mutation. These results underscore the transformative potential of integrating geometric and topological features with machine learning in advancing the field of molecular biology.

This paper is available on arxiv under CC 4.0 license.