Improving Self-Supervision for Language Pretraining: Summary

Author:

(1) Mingda Chen.

Table of Links

3.3 Summary

In this chapter, we described the SOP loss for pretraining language models and a pretraining technique for improving model performance on in-context few-shot learning. In Section 3.1, we compared SOP to NSP on both intrinsic and extrinsic tasks, finding that the two losses encode different information. We showed that the models pretrained with SOP achieve state-of-the-art performance across various benchmarks. In Section 3.2, we benchmarked 4 self-supervised tasks on 13 tasks, finding that similar to human-annotated datasets, self-supervision can also lead to improved downstream task performance. Our analysis uncovered several factors that contributed the improvements, including dataset sizes for the self-supervised tasks, the few-shot templates, and the semantic similarities between the training and evaluation tasks. In addition, we also experimented with concatenating forward and backward language modeling losses to achieve bidirectional training for language models, finding that our proposed approaches showed better performance than the unidirectional language modeling loss but worse than masked language modeling on downstream tasks (see Appendix A.1 for more details).

This paper is available on arxiv under CC 4.0 license.