Improving Self-Supervision for Language Pretraining: Summary

Written by textmodels | Published 2024/06/01
Tech Story Tags: natural-supervision | self-supervision | language-pretraining | improving-self-supervision | self-supervised-training | next-sentence-prediction | pretrained-language-models | improving-llm-performance

TLDRPrior work has found that the next sentence prediction loss used for pretraining is ineffective in improving downstream task performance.via the TL;DR App

Author:

(1) Mingda Chen.

Table of Links

3.3 Summary

In this chapter, we described the SOP loss for pretraining language models and a pretraining technique for improving model performance on in-context few-shot learning. In Section 3.1, we compared SOP to NSP on both intrinsic and extrinsic tasks, finding that the two losses encode different information. We showed that the models pretrained with SOP achieve state-of-the-art performance across various benchmarks. In Section 3.2, we benchmarked 4 self-supervised tasks on 13 tasks, finding that similar to human-annotated datasets, self-supervision can also lead to improved downstream task performance. Our analysis uncovered several factors that contributed the improvements, including dataset sizes for the self-supervised tasks, the few-shot templates, and the semantic similarities between the training and evaluation tasks. In addition, we also experimented with concatenating forward and backward language modeling losses to achieve bidirectional training for language models, finding that our proposed approaches showed better performance than the unidirectional language modeling loss but worse than masked language modeling on downstream tasks (see Appendix A.1 for more details).

This paper is available on arxiv under CC 4.0 license.


Written by textmodels | We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.
Published by HackerNoon on 2024/06/01