Holistic Evaluation of Text-to-Image Models: Human evaluation procedure

Written by autoencoder | Published 2024/10/13
Tech Story Tags: text-to-image-models | ai-model-fairness | ai-bias | heim-benchmark | zero-shot-prompting | prompt-engineering | ai-evaluation-framework | multilingual-ai-models

TLDRThis section covers the process of obtaining human feedback on AI-generated images using Amazon Mechanical Turk. The project required annotators to meet specific qualifications and paid workers based on an hourly wage. The survey and annotation process involved compliance with Institutional Review Board (IRB) standards.via the TL;DR App

Authors:

(1) Tony Lee, Stanford with Equal contribution;

(2) Michihiro Yasunaga, Stanford with Equal contribution;

(3) Chenlin Meng, Stanford with Equal contribution;

(4) Yifan Mai, Stanford;

(5) Joon Sung Park, Stanford;

(6) Agrim Gupta, Stanford;

(7) Yunzhi Zhang, Stanford;

(8) Deepak Narayanan, Microsoft;

(9) Hannah Benita Teufel, Aleph Alpha;

(10) Marco Bellagente, Aleph Alpha;

(11) Minguk Kang, POSTECH;

(12) Taesung Park, Adobe;

(13) Jure Leskovec, Stanford;

(14) Jun-Yan Zhu, CMU;

(15) Li Fei-Fei, Stanford;

(16) Jiajun Wu, Stanford;

(17) Stefano Ermon, Stanford;

(18) Percy Liang, Stanford.

Table of Links

Abstract and 1 Introduction

2 Core framework

3 Aspects

4 Scenarios

5 Metrics

6 Models

7 Experiments and results

8 Related work

9 Conclusion

10 Limitations

Author contributions, Acknowledgments and References

A Datasheet

B Scenario details

C Metric details

D Model details

E Human evaluation procedure

E Human evaluation procedure

E.1 Amazon Mechanical Turk

We used the Amazon Mechanical Turk (MTurk) platform to receive human feedback on the AIgenerated images. Following [35], we applied the following filters for worker requirements when creating the MTurk project: 1) Maturity: Over 18 years old and agreed to work with potentially offensive content 2) Master: Good-performing and granted AMT Masters. We required five different annotators per sample. Figure 6 shows the design layout of the survey.

Based on an hourly wage of $16 per hour, each annotator was paid $0.02 for answering a single multiple-choice question. The total amount spent for human annotations was $13,433.55.

E.2 Human Subjects Institutional Review Board (IRB)

This paper is available on arxiv under CC BY 4.0 DEED license.


Written by autoencoder | Research & publications on Auto Encoders, revolutionizing data compression and feature learning techniques.
Published by HackerNoon on 2024/10/13