Authors:
(1) Tony Lee, Stanford with Equal contribution;
(2) Michihiro Yasunaga, Stanford with Equal contribution;
(3) Chenlin Meng, Stanford with Equal contribution;
(4) Yifan Mai, Stanford;
(5) Joon Sung Park, Stanford;
(6) Agrim Gupta, Stanford;
(7) Yunzhi Zhang, Stanford;
(8) Deepak Narayanan, Microsoft;
(9) Hannah Benita Teufel, Aleph Alpha;
(10) Marco Bellagente, Aleph Alpha;
(11) Minguk Kang, POSTECH;
(12) Taesung Park, Adobe;
(13) Jure Leskovec, Stanford;
(14) Jun-Yan Zhu, CMU;
(15) Li Fei-Fei, Stanford;
(16) Jiajun Wu, Stanford;
(17) Stefano Ermon, Stanford;
(18) Percy Liang, Stanford. Table of Links Abstract and 1 Introduction 2 Core framework 3 Aspects 4 Scenarios 5 Metrics 6 Models 7 Experiments and results 8 Related work 9 Conclusion 10 Limitations Author contributions, Acknowledgments and References A Datasheet B Scenario details C Metric details D Model details E Human evaluation procedure 4 Scenarios To evaluate the 12 aspects (§3), we curate diverse and practical scenarios. Table 2 presents an overview of all the scenarios and their descriptions. Each scenario is a set of textual inputs and can be used to evaluate certain aspects. For instance, the “MS-COCO” scenario can be used to assess the alignment, quality, and efficiency aspects, and the “Inappropriate Image Prompts (I2P)” scenario [8] can be used to assess the toxicity aspect. Some scenarios may include sub-scenarios, indicating the sub-level categories or variations within them, such as “Hate” and “Violence” within I2P. We curate these scenarios by leveraging existing datasets and creating new prompts ourselves. In total, we have 62 scenarios, including the sub-scenarios. Notably, we create new scenarios (indicated with “New” in Table 2) for aspects that were previously underexplored and lacked dedicated datasets. These aspects include originality, aesthetics, bias, and fairness. For example, to evaluate originality, we develop scenarios to test the artistic creativity of these models with textual inputs to generate landing pages, logos, and magazine covers. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Tony Lee, Stanford with Equal contribution; (2) Michihiro Yasunaga, Stanford with Equal contribution; (3) Chenlin Meng, Stanford with Equal contribution; (4) Yifan Mai, Stanford; (5) Joon Sung Park, Stanford; (6) Agrim Gupta, Stanford; (7) Yunzhi Zhang, Stanford; (8) Deepak Narayanan, Microsoft; (9) Hannah Benita Teufel, Aleph Alpha; (10) Marco Bellagente, Aleph Alpha; (11) Minguk Kang, POSTECH; (12) Taesung Park, Adobe; (13) Jure Leskovec, Stanford; (14) Jun-Yan Zhu, CMU; (15) Li Fei-Fei, Stanford; (16) Jiajun Wu, Stanford; (17) Stefano Ermon, Stanford; (18) Percy Liang, Stanford. Authors: Authors: (1) Tony Lee, Stanford with Equal contribution; (2) Michihiro Yasunaga, Stanford with Equal contribution; (3) Chenlin Meng, Stanford with Equal contribution; (4) Yifan Mai, Stanford; (5) Joon Sung Park, Stanford; (6) Agrim Gupta, Stanford; (7) Yunzhi Zhang, Stanford; (8) Deepak Narayanan, Microsoft; (9) Hannah Benita Teufel, Aleph Alpha; (10) Marco Bellagente, Aleph Alpha; (11) Minguk Kang, POSTECH; (12) Taesung Park, Adobe; (13) Jure Leskovec, Stanford; (14) Jun-Yan Zhu, CMU; (15) Li Fei-Fei, Stanford; (16) Jiajun Wu, Stanford; (17) Stefano Ermon, Stanford; (18) Percy Liang, Stanford. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 Core framework 2 Core framework 3 Aspects 3 Aspects 4 Scenarios 4 Scenarios 5 Metrics 5 Metrics 6 Models 6 Models 7 Experiments and results 7 Experiments and results 8 Related work 8 Related work 9 Conclusion 9 Conclusion 10 Limitations 10 Limitations Author contributions, Acknowledgments and References Author contributions, Acknowledgments and References A Datasheet A Datasheet B Scenario details B Scenario details C Metric details C Metric details D Model details D Model details E Human evaluation procedure E Human evaluation procedure 4 Scenarios To evaluate the 12 aspects (§3), we curate diverse and practical scenarios. Table 2 presents an overview of all the scenarios and their descriptions. Each scenario is a set of textual inputs and can be used to evaluate certain aspects. For instance, the “MS-COCO” scenario can be used to assess the alignment, quality, and efficiency aspects, and the “Inappropriate Image Prompts (I2P)” scenario [8] can be used to assess the toxicity aspect. Some scenarios may include sub-scenarios, indicating the sub-level categories or variations within them, such as “Hate” and “Violence” within I2P. We curate these scenarios by leveraging existing datasets and creating new prompts ourselves. In total, we have 62 scenarios, including the sub-scenarios. Notably, we create new scenarios (indicated with “ New ” in Table 2) for aspects that were previously underexplored and lacked dedicated datasets. These aspects include originality, aesthetics, bias, and fairness. For example, to evaluate originality, we develop scenarios to test the artistic creativity of these models with textual inputs to generate landing pages, logos, and magazine covers. New This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Curating 62 Practical Scenarios to Test AI Text-to-Image Models

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

12 Key Aspects for Assessing the Power of Text-to-Image Models

Pay attention to that man behind the curtain

AI Is Inherently Neutral - It Is Human Beings Who Are Biased, and the Machines Merely Replicate Them

AI Is Not the Concern - It’s AI Developers You Should Be Worried About

Are Smart Cities a Threat to Data Privacy?

Can AI Ever Overcome Built-In Human Biases?

12 Key Aspects for Assessing the Power of Text-to-Image Models

Pay attention to that man behind the curtain

AI Is Inherently Neutral - It Is Human Beings Who Are Biased, and the Machines Merely Replicate Them

AI Is Not the Concern - It’s AI Developers You Should Be Worried About

Are Smart Cities a Threat to Data Privacy?

Can AI Ever Overcome Built-In Human Biases?

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps