Author:
(1) Mingda Chen. Table of Links Abstract


Acknowledgements


1 INTRODUCTION


1.1 Overview


1.2 Contributions


2 BACKGROUND


2.1 Self-Supervised Language Pretraining


2.2 Naturally-Occurring Data Structures


2.3 Sentence Variational Autoencoder


2.4 Summary


3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING


3.1 Improving Language Representation Learning via Sentence Ordering Prediction


3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training


3.3 Summary


4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA


4.1 Learning Entity Representations from Hyperlinks


4.2 Learning Discourse-Aware Sentence Representations from Document Structures


4.3 Learning Concept Hierarchies from Document Categories


4.4 Summary


5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY


5.1 Disentangling Semantics and Syntax in Sentence Representations


5.2 Controllable Paraphrase Generation with a Syntactic Exemplar


5.3 Summary


6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS


6.1 Long-Form Data-to-Text Generation


6.2 Long-Form Text Summarization


6.3 Story Generation with Constraints


6.4 Summary


7 CONCLUSION


APPENDIX A - APPENDIX TO CHAPTER 3


APPENDIX B - APPENDIX TO CHAPTER 6


BIBLIOGRAPHY CHAPTER 6 - TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS This chapter describes our contributions to building evaluation tasks from naturally-occurring textual resources. In Section 6.1, we cast generating arbitrary Wikipedia sections as a data-to-text generation problem. We leverage different data sources to create tabular data for a given section text. In Section 6.2 and Section 6.3, we use fan-contributed websites to create summarization and story generation datasets. Due to the rich information provided on these websites, the resulting datasets offer unique challenges in their respective task settings. The material in this chapter is adapted from Chen et al. (2022a), Chen et al. (2021), and Chen and Gimpel (2021). This paper is available on arxiv under CC 4.0 license. Author: (1) Mingda Chen. Author: Author: (1) Mingda Chen. Table of Links Abstract Acknowledgements 1 INTRODUCTION 1.1 Overview 1.2 Contributions 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.3 Summary 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4.1 Learning Entity Representations from Hyperlinks 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.3 Learning Concept Hierarchies from Document Categories 4.4 Summary 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.3 Summary 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6.1 Long-Form Data-to-Text Generation 6.2 Long-Form Text Summarization 6.3 Story Generation with Constraints 6.4 Summary 7 CONCLUSION APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX B - APPENDIX TO CHAPTER 6 BIBLIOGRAPHY Abstract Abstract Abstract Acknowledgements Acknowledgements Acknowledgements 1 INTRODUCTION 1 INTRODUCTION 1 INTRODUCTION 1 INTRODUCTION 1.1 Overview 1.1 Overview 1.1 Overview 1.2 Contributions 1.2 Contributions 1.2 Contributions 2 BACKGROUND 2 BACKGROUND 2 BACKGROUND 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.1 Self-Supervised Language Pretraining 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.2 Naturally-Occurring Data Structures 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.3 Sentence Variational Autoencoder 2.3 Sentence Variational Autoencoder 2.4 Summary 2.4 Summary 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.3 Summary 3.3 Summary 3.3 Summary 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4 LEARNING SEMANTIC KNOWLEDGE FROM WIKIPEDIA 4.1 Learning Entity Representations from Hyperlinks 4.1 Learning Entity Representations from Hyperlinks 4.1 Learning Entity Representations from Hyperlinks 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.2 Learning Discourse-Aware Sentence Representations from Document Structures 4.3 Learning Concept Hierarchies from Document Categories 4.3 Learning Concept Hierarchies from Document Categories 4.3 Learning Concept Hierarchies from Document Categories 4.4 Summary 4.4 Summary 4.4 Summary 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5 DISENTANGLING LATENT REPRESENTATIONS FOR INTERPRETABILITY AND CONTROLLABILITY 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.1 Disentangling Semantics and Syntax in Sentence Representations 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.2 Controllable Paraphrase Generation with a Syntactic Exemplar 5.3 Summary 5.3 Summary 5.3 Summary 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6 TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS 6.1 Long-Form Data-to-Text Generation 6.1 Long-Form Data-to-Text Generation 6.1 Long-Form Data-to-Text Generation 6.2 Long-Form Text Summarization 6.2 Long-Form Text Summarization 6.2 Long-Form Text Summarization 6.3 Story Generation with Constraints 6.3 Story Generation with Constraints 6.3 Story Generation with Constraints 6.4 Summary 6.4 Summary 6.4 Summary 7 CONCLUSION 7 CONCLUSION 7 CONCLUSION 7 CONCLUSION APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX A - APPENDIX TO CHAPTER 3 APPENDIX B - APPENDIX TO CHAPTER 6 APPENDIX B - APPENDIX TO CHAPTER 6 APPENDIX B - APPENDIX TO CHAPTER 6 BIBLIOGRAPHY BIBLIOGRAPHY BIBLIOGRAPHY CHAPTER 6 - TAILORING TEXTUAL RESOURCES FOR EVALUATION TASKS This chapter describes our contributions to building evaluation tasks from naturally-occurring textual resources. In Section 6.1, we cast generating arbitrary Wikipedia sections as a data-to-text generation problem. We leverage different data sources to create tabular data for a given section text. In Section 6.2 and Section 6.3, we use fan-contributed websites to create summarization and story generation datasets. Due to the rich information provided on these websites, the resulting datasets offer unique challenges in their respective task settings. The material in this chapter is adapted from Chen et al. (2022a), Chen et al. (2021), and Chen and Gimpel (2021). This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Leveraging Natural Supervision: Tailoring Textual Resources for Evaluation Tasks

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Tailoring Textual Resources for Evaluation Tasks: Story Generation with Constraints

Tailoring Textual Resources for Evaluation Tasks: Long-Form Text Summarization

Tailoring Textual Resources for Evaluation Tasks: Long-Form Data-to-Text Generation

Tailoring Textual Resources for Evaluation Tasks: Summary

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Tailoring Textual Resources for Evaluation Tasks: Story Generation with Constraints

Tailoring Textual Resources for Evaluation Tasks: Long-Form Text Summarization

Tailoring Textual Resources for Evaluation Tasks: Long-Form Data-to-Text Generation

Tailoring Textual Resources for Evaluation Tasks: Summary

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps