The Why and How of Dataset Creationby@textmodels
113 reads

The Why and How of Dataset Creation

tldt arrow

Too Long; Didn't Read

This section prompts dataset creators to clarify the purpose, team, funding sources, and any additional comments regarding the creation of the dataset, promoting transparency and understanding of its origins.
featured image - The Why and How of Dataset Creation
Writings, Papers and Blogs on Text Models HackerNoon profile picture


(1) TIMNIT GEBRU, Black in AI;

(2) JAMIE MORGENSTERN, University of Washington;

(3) BRIANA VECCHIONE, Cornell University;

(4) JENNIFER WORTMAN VAUGHAN, Microsoft Research;

(5) HANNA WALLACH, Microsoft Research;

(6) HAL DAUMÉ III, Microsoft Research; University of Maryland;

(7) KATE CRAWFORD, Microsoft Research.

1 Introduction

1.1 Objectives

2 Development Process

3 Questions and Workflow

3.1 Motivation

3.2 Composition

3.3 Collection Process

3.4 Preprocessing/cleaning/labeling

3.5 Uses

3.6 Distribution

3.7 Maintenance

4 Impact and Challenges

Acknowledgments and References


3.1 Motivation

The questions in this section are primarily intended to encourage dataset creators to clearly articulate their reasons for creating the dataset and to promote transparency about funding interests. The latter may be particularly relevant for datasets created for research purposes.

• For what purpose was the dataset created? Was there a specific task in mind? Was there a specific gap that needed to be filled? Please provide a description.

• Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)?

• Who funded the creation of the dataset? If there is an associated grant, please provide the name of the grantor and the grant name and number.

• Any other comments?

This paper is available on arxiv under CC 4.0 license.