Author:
(1) Mingda Chen. Table of Links Abstract


Acknowledgements


1 INTRODUCTION


1.1 Overview


1.2 Contributions


2 BACKGROUND


2.1 Self-Supervised Language Pretraining


2.2 Naturally-Occurring Data Structures


2.3 Sentence Variational Autoencoder


2.4 Summary


3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING


3.1 Improving Language Representation Learning via Sentence Ordering Prediction


3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training


3.3 Summary


4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA


4.1 Learning Entity Representations from Hyperlinks


4.2 Learning Discourse-Aware Sentence Representations from Document Structures


4.3 Learning Concept Hierarchies from Document Categories


4.4 Summary


5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY


5.1 Disentangling Semantics and Syntax in Sentence Representations


5.2 Controllable Paraphrase Generation with a Syntactic Exemplar


5.3 Summary


6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS


6.1 Long-Form Data-to-Text Generation


6.2 Long-Form Text Summarization


6.3 Story Generation with Constraints


6.4 Summary


7 CONCLUSION


APPENDIX A - APPENDIX TO CHAPTER 3


APPENDIX B - APPENDIX TO CHAPTER 6


BIBLIOGRAPHY 1.2 Contributions In summary, this thesis makes the following contributions: • By adequately designing self-supervision, we improve the quality of pretrained language models and their abilities for cross-task generalization. In Section 3.1, we replace the next sentence prediction loss with a novel sentence ordering prediction loss in language model pretraining and show that the change led to a series of state-of-the-art pretrained encoders. In Section 3.2, in contrast to previous work, which finetuned pretrained decoders on human annotated datasets, we show that self-supervised tasks with proper designs could also lead to similar gains in the in-context few-shot learning setting, promoting models’ ability in cross-task generalization. • We design model architectures and training objectives to exploit the rich structures in Wikipedia articles. In Section 4.1, we leverage hyperlinks as supervision for pretraining entity representations, leading to models that can encode arbitrary entities. In Section 4.2, we use article structures, such as section and document titles, to train sentence representations. Evaluation results on discourse-related tasks show that such training helped model performance. In Section 4.3, we extract training data from article category graphs and demonstrate that the extracted data improves model performance on textual entailment tasks. These results reveal the advantages of structure-aware model pretraining. • We leverage the pair data structure in paraphrases and bilingual text to disentangle semantics and syntax in sentence representations, which allows us to learn interpretable and controllable neural models. In Section 5.1, we build the first neural models to disentangle semantics and syntax in sentence representations. The models use the fact that for a paraphrase pair, the semantics is shared, but syntax varies. In addition to semantic evaluation metrics, we propose evaluation metrics for syntactic representations, finding that the best performance for both metrics is achieved when there is maximal disentanglement between the two latent representations. In Section 5.2, we adapt this framework for controlled paraphrasing, where we seek to control the output text with a syntactic, sentential exemplar. To formally define this controlled generation task, we annotate evaluation sets and proposed evaluation metrics. In a later work, we extend this framework and task setting to machine translation (Chen et al., 2020b), showing the potential that this idea could generalize to arbitrary data with the pair data structure. • We demonstrate that we can create challenging benchmark datasets for various long-form text generation tasks by tailoring fan-contributed textual resources. We do so by defining new NLP tasks and studying these new tasks through extensive experiments. In Section 6.1, we generate arbitrary Wikipedia section text from various tabular data by casting the task as long-form data-to-text generation and creating a large-scale dataset. The task is challenging as models need to generate a coherent passage connecting all the entities in the tabular data, and the story also needs to fit the background knowledge in the tabular data. In Section 6.2, we summarize lengthy transcripts for TV shows. The task has several challenges: e.g., plot information is not stated explicitly but rather only implied in the dialogue and the need to draw information from a wide range of the input transcripts. As characters are fundamental to TV show plots, we also propose two character-centric evaluation metrics. In Section 6.3, we generate long-form stories from character descriptions and summaries. The task poses several challenges for story generation models, including lengthy inputs and outputs and consistency in character modeling. This paper is available on arxiv under CC 4.0 license. Author: (1) Mingda Chen. Author: Author: (1) Mingda Chen. Table of Links Abstract Acknowledgements 1 INTRODUCTION 1.1 Overview 1.2 Contributions 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.3 Summary 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4.1 Learning Entity Representations from Hyperlinks 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.3 Learning Concept Hierarchies from Document Categories 4.4 Summary 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.3 Summary 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6.1 Long-Form Data-to-Text Generation 6.2 Long-Form Text Summarization 6.3 Story Generation with Constraints 6.4 Summary 7 CONCLUSION APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX B - APPENDIX TO CHAPTER 6 BIBLIOGRAPHY Abstract Abstract Abstract Acknowledgements Acknowledgements Acknowledgements 1 INTRODUCTION 1 INTRODUCTION 1 INTRODUCTION 1 INTRODUCTION 1.1 Overview 1.1 Overview 1.1 Overview 1.2 Contributions 1.2 Contributions 1.2 Contributions 2 BACKGROUND 2 BACKGROUND 2 BACKGROUND 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.1 Self-Supervised Language Pretraining 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.2 Naturally-Occurring Data Structures 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.3 Sentence Variational Autoencoder 2.3 Sentence Variational Autoencoder 2.4 Summary 2.4 Summary 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.3 Summary 3.3 Summary 3.3 Summary 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4.1 Learning Entity Representations from Hyperlinks 4.1 Learning Entity Representations from Hyperlinks 4.1 Learning Entity Representations from Hyperlinks 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.3 Learning Concept Hierarchies from Document Categories 4.3 Learning Concept Hierarchies from Document Categories 4.3 Learning Concept Hierarchies from Document Categories 4.4 Summary 4.4 Summary 4.4 Summary 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.3 Summary 5.3 Summary 5.3 Summary 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6.1 Long-Form Data-to-Text Generation 6.1 Long-Form Data-to-Text Generation 6.1 Long-Form Data-to-Text Generation 6.2 Long-Form Text Summarization 6.2 Long-Form Text Summarization 6.2 Long-Form Text Summarization 6.3 Story Generation with Constraints 6.3 Story Generation with Constraints 6.3 Story Generation with Constraints 6.4 Summary 6.4 Summary 6.4 Summary 7 CONCLUSION 7 CONCLUSION 7 CONCLUSION 7 CONCLUSION APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX B - APPENDIX TO CHAPTER 6 APPENDIX B - APPENDIX TO CHAPTER 6 APPENDIX B - APPENDIX TO CHAPTER 6 BIBLIOGRAPHY BIBLIOGRAPHY BIBLIOGRAPHY 1.2 Contributions In summary, this thesis makes the following contributions: • By adequately designing self-supervision, we improve the quality of pretrained language models and their abilities for cross-task generalization. In Section 3.1, we replace the next sentence prediction loss with a novel sentence ordering prediction loss in language model pretraining and show that the change led to a series of state-of-the-art pretrained encoders. In Section 3.2, in contrast to previous work, which finetuned pretrained decoders on human annotated datasets, we show that self-supervised tasks with proper designs could also lead to similar gains in the in-context few-shot learning setting, promoting models’ ability in cross-task generalization. • We design model architectures and training objectives to exploit the rich structures in Wikipedia articles. In Section 4.1, we leverage hyperlinks as supervision for pretraining entity representations, leading to models that can encode arbitrary entities. In Section 4.2, we use article structures, such as section and document titles, to train sentence representations. Evaluation results on discourse-related tasks show that such training helped model performance. In Section 4.3, we extract training data from article category graphs and demonstrate that the extracted data improves model performance on textual entailment tasks. These results reveal the advantages of structure-aware model pretraining. • We leverage the pair data structure in paraphrases and bilingual text to disentangle semantics and syntax in sentence representations, which allows us to learn interpretable and controllable neural models. In Section 5.1, we build the first neural models to disentangle semantics and syntax in sentence representations. The models use the fact that for a paraphrase pair, the semantics is shared, but syntax varies. In addition to semantic evaluation metrics, we propose evaluation metrics for syntactic representations, finding that the best performance for both metrics is achieved when there is maximal disentanglement between the two latent representations. In Section 5.2, we adapt this framework for controlled paraphrasing, where we seek to control the output text with a syntactic, sentential exemplar. To formally define this controlled generation task, we annotate evaluation sets and proposed evaluation metrics. In a later work, we extend this framework and task setting to machine translation (Chen et al., 2020b), showing the potential that this idea could generalize to arbitrary data with the pair data structure. • We demonstrate that we can create challenging benchmark datasets for various long-form text generation tasks by tailoring fan-contributed textual resources. We do so by defining new NLP tasks and studying these new tasks through extensive experiments. In Section 6.1, we generate arbitrary Wikipedia section text from various tabular data by casting the task as long-form data-to-text generation and creating a large-scale dataset. The task is challenging as models need to generate a coherent passage connecting all the entities in the tabular data, and the story also needs to fit the background knowledge in the tabular data. In Section 6.2, we summarize lengthy transcripts for TV shows. The task has several challenges: e.g., plot information is not stated explicitly but rather only implied in the dialogue and the need to draw information from a wide range of the input transcripts. As characters are fundamental to TV show plots, we also propose two character-centric evaluation metrics. In Section 6.3, we generate long-form stories from character descriptions and summaries. The task poses several challenges for story generation models, including lengthy inputs and outputs and consistency in character modeling. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Leveraging Natural Supervision for Language Representation Learning and Generation: Contributions

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Leveraging Natural Supervision: Appendix A - Appendix to Chapter 3

Leveraging Natural Supervision: Naturally-Occurring Data Structures

Leveraging Natural Supervision for Language Representation Learning and Generation: Overview

Leveraging Natural Supervision for Language Representation Learning and Generation: Introduction

Leveraging Natural Supervision for Language Representation Learning and Generation: Bibliography

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Leveraging Natural Supervision: Appendix A - Appendix to Chapter 3

Leveraging Natural Supervision: Naturally-Occurring Data Structures

Leveraging Natural Supervision for Language Representation Learning and Generation: Overview

Leveraging Natural Supervision for Language Representation Learning and Generation: Introduction

Leveraging Natural Supervision for Language Representation Learning and Generation: Bibliography

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps