Using Language Models to Simulate Human Samples: Appendix

Written by textmodels | Published 2024/06/11
Tech Story Tags: ai-training-data | data-provenance | mitigating-bias-in-ai | ai-transparency | ai-ethics | machine-learning-models | datasheets-for-datasets | ai-data-documentation

TLDR

Datasheets for datasets have gained traction across academic and industry settings, fostering transparency and accountability. While implementation challenges exist, the benefits of improved communication and accountability outweigh the costs, driving adoption and evolution in dataset creation practices.via the TL;DR App

Authors:

(1) TIMNIT GEBRU, Black in AI;

(2) JAMIE MORGENSTERN, University of Washington;

(3) BRIANA VECCHIONE, Cornell University;

(4) JENNIFER WORTMAN VAUGHAN, Microsoft Research;

(5) HANNA WALLACH, Microsoft Research;

(6) HAL DAUMÉ III, Microsoft Research; University of Maryland;

(7) KATE CRAWFORD, Microsoft Research.

Table of Links

2 Development Process

3 Questions and Workflow

3.2 Composition

3.3 Collection Process

3.4 Preprocessing/cleaning/labeling

3.6 Distribution

3.7 Maintenance

4 Impact and Challenges

Acknowledgments and References

A Appendix

In this appendix, we provide an example datasheet for Pang and Lee’s polarity dataset [22] (figure 1 to figure 4).

This paper is available on arxiv under CC 4.0 license.

Written by textmodels | We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

Published by HackerNoon on 2024/06/11