Author:
(1) Mingda Chen.
3.1 Improving Language Representation Learning via Sentence Ordering Prediction
3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training
4.2 Learning Discourse-Aware Sentence Representations from Document Structures
5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY
5.1 Disentangling Semantics and Syntax in Sentence Representations
5.2 Controllable Paraphrase Generation with a Syntactic Exemplar
In this chapter, we described the SOP loss for pretraining language models and a pretraining technique for improving model performance on in-context few-shot learning. In Section 3.1, we compared SOP to NSP on both intrinsic and extrinsic tasks, finding that the two losses encode different information. We showed that the models pretrained with SOP achieve state-of-the-art performance across various benchmarks. In Section 3.2, we benchmarked 4 self-supervised tasks on 13 tasks, finding that similar to human-annotated datasets, self-supervision can also lead to improved downstream task performance. Our analysis uncovered several factors that contributed the improvements, including dataset sizes for the self-supervised tasks, the few-shot templates, and the semantic similarities between the training and evaluation tasks. In addition, we also experimented with concatenating forward and backward language modeling losses to achieve bidirectional training for language models, finding that our proposed approaches showed better performance than the unidirectional language modeling loss but worse than masked language modeling on downstream tasks (see Appendix A.1 for more details).
This paper is available on arxiv under CC 4.0 license.