156 reads

The Why and How of Dataset Creation

June 10th, 2024

← Previous

How to Create Detailed Datasheets for AI Datasets

Up Next →

Understanding Dataset Instances and Relationships

About Author

Writings, Papers and Blogs on Text Models@textmodels

We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

Read my stories About @textmodels

Comments

TOPICS

machine-learning #ai-training-data #data-provenance #mitigating-bias-in-ai #ai-transparency #ai-ethics #machine-learning-models #datasheets-for-datasets #ai-data-documentation

THIS ARTICLE WAS FEATURED IN

Arweave

ViewBlock

Terminal

Lite Also published here

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Writings, Papers and Blogs on Text Models

Apr 04, 2025

#AI-TRAINING-DATA

Out of One, Many: Using Language Models to Simulate Human Samples

EScholar: Electronic Academic Papers for Scholars

Jun 10, 2024

#AI-TRAINING-DATA

Refining Dataset Documentation: A Two-Year Journey to Improve AI Data Transparency

Writings, Papers and Blogs on Text Models

Jun 10, 2024

#AI-TRAINING-DATA

How to Create Detailed Datasheets for AI Datasets

Writings, Papers and Blogs on Text Models

Jun 10, 2024

#AI-TRAINING-DATA

Understanding Dataset Instances and Relationships

Writings, Papers and Blogs on Text Models

Jun 10, 2024

#AI-TRAINING-DATA

Data Collection for ML Models: Strategies and Protocols for Ensuring Dataset Integrity

Writings, Papers and Blogs on Text Models

Jun 10, 2024

#LARGE-LANGUAGE-MODELS

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Writings, Papers and Blogs on Text Models

Apr 04, 2025

#AI-TRAINING-DATA

Out of One, Many: Using Language Models to Simulate Human Samples

EScholar: Electronic Academic Papers for Scholars

Jun 10, 2024

#AI-TRAINING-DATA

Refining Dataset Documentation: A Two-Year Journey to Improve AI Data Transparency

Writings, Papers and Blogs on Text Models

Jun 10, 2024

#AI-TRAINING-DATA

How to Create Detailed Datasheets for AI Datasets

Writings, Papers and Blogs on Text Models

Jun 10, 2024

#AI-TRAINING-DATA

Understanding Dataset Instances and Relationships

Writings, Papers and Blogs on Text Models

Jun 10, 2024

#AI-TRAINING-DATA

Data Collection for ML Models: Strategies and Protocols for Ensuring Dataset Integrity

Writings, Papers and Blogs on Text Models

Jun 10, 2024

The Why and How of Dataset Creation

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Out of One, Many: Using Language Models to Simulate Human Samples

Refining Dataset Documentation: A Two-Year Journey to Improve AI Data Transparency

How to Create Detailed Datasheets for AI Datasets

Understanding Dataset Instances and Relationships

Data Collection for ML Models: Strategies and Protocols for Ensuring Dataset Integrity

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

Out of One, Many: Using Language Models to Simulate Human Samples

Refining Dataset Documentation: A Two-Year Journey to Improve AI Data Transparency

How to Create Detailed Datasheets for AI Datasets

Understanding Dataset Instances and Relationships

Data Collection for ML Models: Strategies and Protocols for Ensuring Dataset Integrity

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps