Author:
(1) Mingda Chen.
3.1 Improving Language Representation Learning via Sentence Ordering Prediction
3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training
4.2 Learning Discourse-Aware Sentence Representations from Document Structures
5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY
5.1 Disentangling Semantics and Syntax in Sentence Representations
5.2 Controllable Paraphrase Generation with a Syntactic Exemplar
In this chapter, we describe the background materials needed for the remainder of this thesis. In Chapter 3, we present our contributions to improving self-supervised training objectives for language model pretraining. The new training objectives help enhance the quality of general language representations and model performance on few-shot learning. Chapter 4 presents our contributions to exploit naturallyoccurring data structures on Wikipedia for entity and sentence representations and textual entailment. Chapter 5 presents our contributions on leveraging freelyavailable parallel corpora for disentangling semantic and syntactic representations. Then we apply the technique to controlling the syntax of generated sentences using a sentential exemplar. Chapter 6 presents our contributed datasets for data-to-text generation, abstractive summarization, and story generation. They are tailored from naturally-occurring textual resources and have unique challenges in their respective task settings.
This paper is available on arxiv under CC 4.0 license.