Authors:
(1) Wenxuan Wang, The Chinese University of Hong Kong, Hong Kong, China;
(2) Haonan Bai, The Chinese University of Hong Kong, Hong Kong, China
(3) Jen-tse Huang, The Chinese University of Hong Kong, Hong Kong, China;
(4) Yuxuan Wan, The Chinese University of Hong Kong, Hong Kong, China;
(5) Youliang Yuan, The Chinese University of Hong Kong, Shenzhen Shenzhen, China
(6) Haoyi Qiu University of California, Los Angeles, Los Angeles, USA;
(7) Nanyun Peng, University of California, Los Angeles, Los Angeles, USA
(8) Michael Lyu, The Chinese University of Hong Kong, Hong Kong, China.
3.1 Seed Image Collection and 3.2 Neutral Prompt List Collection
3.3 Image Generation and 3.4 Properties Assessment
4.2 RQ1: Effectiveness of BiasPainter
4.3 RQ2 - Validity of Identified Biases
7 Conclusion, Data Availability, and References
In this RQ, we investigate whether the biased behaviors exposed by BiasPainter are valid through manual inspection.
The vulnerable part of BiasPainter is bias identification, where several AI methods and API are used to evaluate the change in race/gender/age. To ensure that the social biases detected by BiasPainter are truly biased, we perform a manual inspection of the bias identification process. In particular, we recruited two annotators, both have a bachelor’s degree and are proficient in English, to annotate the (seed image, generated image) pairs.
For age, we randomly select 10, 10, and 20 (seed image, generated image) pairs that are identified as becoming older (image age bias score > 1), becoming younger (image age bias score < -1), and no significant change on age (0.2 > image age bias score > -0.2), respectively, by BiasPainter. For each pair, annotators are asked a multiple-choice question: A. person 2 is older than person 1; B. person 2 is younger than person 1; C. There is no significant difference between the age of person 2 and person 1.
For gender, we randomly select 10, 10, and 20 (seed image, generated image) pairs that are identified as female to male (image gender bias score = -1), male to female (image gender bias score = 1), and no change on race (image gender bias score = 0), respectively, by BiasPainter. For each pair, annotators are asked a multiple-choice question: A. person 1 is male and person 2 is male; B. person 1 is male and person 2 is female; C. person 1 is female and person 2 is male; D. person 1 is female and person 2 is female.
For race, we randomly select 10, 10, and 20 (seed image, generated image) pairs that are identified as becoming lighter (image race bias score > 1), becoming darker (image race bias score < -1), and no significant change on skin tone (0.2 > image race bias score > -0.2), respectively, by BiasPainter. For each pair, annotators are asked a multiple-choice question: A. the skin tone of person 2 is lighter than person 1; B. the skin tone of person 2 is darker than person 1; C. There is no significant difference between the skin tone of person 2 and person 1.
Annotations are done separately and then they discuss the results and resolve differences to obtain a consensus version of the annotation. By comparing the identification results from BiasPainter with annotated results from the annotators, we calculate the accuracy of BiasPainter. BiasPainter achieves an accuracy of 90.8%, indicating that the bias identification results are reliable.
This paper is available on arxiv under CC0 1.0 DEED license.