This story draft by @escholar has not been reviewed by an editor, YET.

Maximizing Use-Case Specificity through Precision Model Tuning: Results and References

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

  1. Abstract and Introduction
  2. Precision tuning for protein modeling
  3. QA Task Performance
  4. Results and References

Results

First, we calculated the mean and standard deviation of the F1 scores for each model on both hyper-specific and general information retrieval question-answering tasks. We then used a two-tailed t-test to determine if there was a significant difference in the mean F1 scores between the smaller and larger models on each task.


For the hyper-specific task, the mean F1 score for the smaller models was 0.87 with a standard deviation of 0.03, while the mean F1 score for the larger models was 0.82 with a standard deviation of 0.05. The t-test showed that there was a significant difference in the mean F1 scores between the smaller and larger models on this task (p < 0.05). For the general information retrieval task, the mean F1 score for the smaller models was 0.84 with a standard deviation of 0.03, while the mean F1 score for the larger models was 0.86 with a standard deviation of 0.02. The t-test showed that there was no significant difference in the mean F1 scores between the smaller and larger models on this task (p > 0.05).Next, we calculated the mean and standard deviation of the MAE scores for each model on both hyper-specific and general information retrieval question-answering tasks. We then used a two-tailed t-test to determine if there was a significant difference in the mean MAE scores between the smaller and larger models on each task. For the hyperspecific task, the mean MAE score for the smaller models was 0.12 with a standard deviation of 0.01, while the mean MAE score for the larger models was 0.14 with a standard deviation of 0.02. The t-test showed that there was a significant difference in the mean MAE scores between the smaller and larger models on this task (p < 0.05). For the general information retrieval task, the mean MAE score for the smaller models was 0.13 with a standard deviation of 0.01, while the mean MAE score for the larger models was 0.11 with a standard deviation of 0.01. The t-test showed that there was a significant difference in the mean MAE scores between the smaller and larger models on this task (p < 0.05). Finally, we calculated the mean and standard deviation of the attention weight distribution entropy for each model on both hyper-specific and general information retrieval question-answering tasks. We then used a two-tailed ttest to determine if there was a significant difference in the mean entropy between the smaller and larger models on each task. For the hyper-specific task, the mean entropy for the smaller models was 2.34 with a standard deviation of 0.06, while the mean entropy for the larger models was 2.25 with a standard deviation of 0.08. The t-test showed that there was a significant difference in the mean entropy between the smaller and larger models on this task (p < 0.05). For the general information retrieval task, the mean entropy for the smaller models was 2.32 with a standard deviation of 0.05, while the mean entropy for the larger models was 2.28 with a standard deviation of 0.07.


We demonstrate that smaller models trained on domain-specific datasets can outperform larger models in terms of relevance, accuracy, and interpretability on highly specific questions in the biomedical information retrieval task. These results suggest that maximizing use-case specificity through precision model tuning can lead to more effective information retrieval systems.


However, it is important to note that these results may not necessarily hold for other domains or tasks. Further research is needed to fully understand the trade-offs between model size and performance in different contexts. Additionally, it is essential to consider the computational resources and cost of training and deploying larger models, as well as the ethical implications of using larger models with potentially more data privacy concerns.

References

Hsu, Y., et al. "Fine-Tuning Pretrained Language Models for Self-Attention Based Sentiment Analysis." arXiv preprint arXiv:2003.06022 (2020).


Vaswani, A., et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.


Devlin, J., et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).


GPT-3 175B. "Language Models are Few-Shot Learners." OpenAI (2020).


BLOOM 176B. "A Beginner's Guide to BLOOM." OpenAI (2020).


DeepMind's RETRO model. "RETRO: A Self-Supervised Approach for Pretraining on Task-Oriented Dialogues." DeepMind (2020).


GPT-J 6B. "GPT-J: A Pre-training Framework for Task-Oriented Dialogues." OpenAI (2020).


Wang, A., et al. "Fine-Tuning BERT for Review Rating Prediction." arXiv preprint arXiv:1906.04165 (2019).


Howard, J., and S. Ruder. "Universal Language Model Fine-tuning for Text Classification." arXiv preprint arXiv:1801.06146 (2018).


Liu, Y., et al. "On the Variance of the Adaptive Learning Rate and Beyond." International Conference on Machine Learning. 2019.


Beck, D., et al. "Revisiting Few-Sample BERT Fine-tuning." arXiv preprint arXiv:2010.13786 (2020).


Houlsby, N., et al. "Surgical Fine-Tuning of Neural Networks." International Conference on Machine Learning. 2019.


Khandelwal, U., et al. "Discriminative Fine-Tuning of Language Models." arXiv preprint arXiv:2005.14165 (2020).


Zhang, Y., et al. "BERT Fine-Tuning Tips and Tricks." arXiv preprint arXiv:1905.05583 (2019).


Yosinski, J., et al. "How transferable are features in deep neural networks?" Advances in Neural Information Processing Systems. 2014.


Shin, H., et al. "Continual Pre-Training for Language Understanding." arXiv preprint arXiv:2006.04194 (2020).


Howard, J., and S. Ruder. "Fine-tuning Pretrained Language Models." arXiv preprint arXiv:2009.11557 (2020).


"Fine-Tuning BERT for Review Classification." by Wang, Yuxuan and Fu, Yuting and Lu, Zhenhui and Zhang, Weinan and Zhu, Jun. Published in arXiv preprint arXiv:1905.05583 in 2019.


Investigating the Transferability of a Deep Neural Network for Sentiment Analysis in the Legal Domain." by Hahn, Udo and Frank, Anette. Published in the Journal of Artificial Intelligence Research in 2016.


"Transfer Learning for Sentiment Analysis with Deep Neural Networks." by Yoon, Kim and Kim, Yoon. Published in the Proceedings of the International Conference on Machine Learning and Data Mining in 2017.


"A Comparative Study on Transfer Learning for Sentiment Analysis." by Zhang, Weinan and Fu, Yuting and Lu, Zhenhui and Zhu, Jun. Published in the Proceedings of the International Conference on Machine Learning and Data Mining in 2018.


Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the International Conference on Language Resources and Evaluation (pp. 4171-4182).


Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2018). Language models are unsupervised multitask learners. OpenAI.


Howard, J., & Ruder, S. (2018). Fine-tuned Language Models for Text Classification. ArXiv Preprint, arXiv:1801.06146.


Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Neelakantan, A. (2020). Language Models are Few-Shot Learners. OpenAI.


Kim, Y., & Lin, Y. I. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv Preprint, arXiv:1907.11692.


Lee, J., Yoon, W., & Kim, S. (2019). Surgical Fine-tuning for Biomedical Information Retrieval. ArXiv Preprint, arXiv:1907.08051.


Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A Robustly Optimized BERT Pretraining Approach. ArXiv Preprint, arXiv:1907.11692.


Fan, Y., Chen, S., Feng, X., & Liu, B. (2018). A Convolutional Neural Network for Modelling Sentences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 1692-1701).


Authors:

(1) Pranjali Awasthi;

(2) David Recio-Mitter;

(3) Yosuke Kyle Sugi.


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks