paint-brush
The Why and How of Dataset Creationby@textmodels
123 reads

The Why and How of Dataset Creation

by Writings, Papers and Blogs on Text Models
Writings, Papers and Blogs on Text Models HackerNoon profile picture

Writings, Papers and Blogs on Text Models

@textmodels

We publish the best academic papers on rule-based techniques, LLMs,...

June 10th, 2024
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section prompts dataset creators to clarify the purpose, team, funding sources, and any additional comments regarding the creation of the dataset, promoting transparency and understanding of its origins.
featured image - The Why and How of Dataset Creation
1x
Read by Dr. One voice-avatar

Listen to this story

Writings, Papers and Blogs on Text Models HackerNoon profile picture
Writings, Papers and Blogs on Text Models

Writings, Papers and Blogs on Text Models

@textmodels

We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

About @textmodels
LEARN MORE ABOUT @TEXTMODELS'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Academic Research Paper

Academic Research Paper

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Authors:

(1) TIMNIT GEBRU, Black in AI;

(2) JAMIE MORGENSTERN, University of Washington;

(3) BRIANA VECCHIONE, Cornell University;

(4) JENNIFER WORTMAN VAUGHAN, Microsoft Research;

(5) HANNA WALLACH, Microsoft Research;

(6) HAL DAUMÉ III, Microsoft Research; University of Maryland;

(7) KATE CRAWFORD, Microsoft Research.

1 Introduction

1.1 Objectives

2 Development Process

3 Questions and Workflow

3.1 Motivation

3.2 Composition

3.3 Collection Process

3.4 Preprocessing/cleaning/labeling

3.5 Uses

3.6 Distribution

3.7 Maintenance

4 Impact and Challenges

Acknowledgments and References

Appendix

3.1 Motivation

The questions in this section are primarily intended to encourage dataset creators to clearly articulate their reasons for creating the dataset and to promote transparency about funding interests. The latter may be particularly relevant for datasets created for research purposes.


• For what purpose was the dataset created? Was there a specific task in mind? Was there a specific gap that needed to be filled? Please provide a description.


• Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)?


• Who funded the creation of the dataset? If there is an associated grant, please provide the name of the grantor and the grant name and number.


• Any other comments?


This paper is available on arxiv under CC 4.0 license.


L O A D I N G
. . . comments & more!

About Author

Writings, Papers and Blogs on Text Models HackerNoon profile picture
Writings, Papers and Blogs on Text Models@textmodels
We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Also published here
X REMOVE AD