This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Jeffrey Ouyang-Zhang, UT Austin
(2) Daniel J. Diaz, UT Austin
(3) Adam R. Klivans, UT Austin
(4) Philipp Krähenbühl, UT Austin
First, our training dataset contains biases that may affect model performance. The training set contains only small proteins, which may limit performance on larger ones. Our model may exhibit biases towards certain types of mutations due to the data imbalance in our training set. Second, the limited availability of experimental stability data poses a challenge for in-silico evaluation. Evaluation on larger and more diverse datasets is necessary to fully assess the generalizability of our model. In the future, we hope that high-throughput experimental assays will enable more rigorous evaluation and further improvements in protein stability prediction.