Authors:
(1) Wenxuan Wang, The Chinese University of Hong Kong, Hong Kong, China;
(2) Haonan Bai, The Chinese University of Hong Kong, Hong Kong, China
(3) Jen-tse Huang, The Chinese University of Hong Kong, Hong Kong, China;
(4) Yuxuan Wan, The Chinese University of Hong Kong, Hong Kong, China;
(5) Youliang Yuan, The Chinese University of Hong Kong, Shenzhen Shenzhen, China
(6) Haoyi Qiu University of California, Los Angeles, Los Angeles, USA;
(7) Nanyun Peng, University of California, Los Angeles, Los Angeles, USA
(8) Michael Lyu, The Chinese University of Hong Kong, Hong Kong, China.
3.1 Seed Image Collection and 3.2 Neutral Prompt List Collection
3.3 Image Generation and 3.4 Properties Assessment
4.2 RQ1: Effectiveness of BiasPainter
4.3 RQ2 - Validity of Identified Biases
7 Conclusion, Data Availability, and References
Image generation models, which generate images from a given text, have recently drawn a lot of interest from academia and the industry. For example, Stable Diffusion [37], an open-sourced latent text-to-image diffusion model, has 60K stars on github [1]. And Midjourney, an AI image generation commercial software product launched on July 2022, has more than 15 million users [13]. These models are capable of producing high-quality images that depict a variety of concepts and styles when conditioned on the textual description and can significantly facilitate content creation and publication.
Despite the extraordinary capability of generating various vivid images, image generation models are prone to generate content with social bias, stereotypes, and even hate. For example, previous work has found that text-to-image generation models tend to associate males with software engineers, females with housekeepers, and white people with attractive people [3]. It is because these models have been trained on massive datasets of images and text scraped from the web, which is known to contain stereotyped, biased, and toxic contents [21]. The ramifications of such biased content are far-reaching, from reinforcing stereotypes to causing brand and reputation damage and even impacting individual well-being.
To mitigate the social bias and stereotypes in image generation models, an essential step is to trigger the biased content and evaluate the fairness of such models. Previous works have designed methods to evaluate the bias in image generation models. For example, [12] generates images from a set of words that should not be related to a specific gender or race (e.g., secretary, rich person). Then, with a pre-trained image-text alignment model named CLIP [35], the authors classify the generated images into gender and race categories. However, these works suffer from several drawbacks. First, their accuracy is relatively low. Detecting race and gender is not an easy task, considering the high diversity of generating style and content. According to their human evaluation results, the detecting method is not accurate (e.g., [12] only achieves 40% on race), leading to concern about the effectiveness and soundness of the method. Second, their scope is limited. Previous work [3] involves human annotation to evaluate the bias in generated images, aiming for an accurate evaluation. However, this manual method needs extensive human effort and is not scalable. Third, there is a lack of comprehensive evaluation across various demographic groups. According to a previous study [3], more than 90% of the images produced by image generation models are of white people, which implies the comprehensive evaluation of the bias in other groups, such as East Asian people and black people.
In this paper, we design a novel metamorphic testing framework, BiasPainter, that can automatically, comprehensively and accurately measure the social bias in image generation models. In particular, BiasPainter operates by inputting both images and text (prompts) into these models and then analyzing how the models edit these images. Under this setting, we define a metamorphic relation that when prompted with gender, race, and age-neutral prompts, the gender, race, and age of the original image should not significantly alter after editing. Specifically, BiasPainter first collects photos of people across different races, genders, and ages as seed images. Then, it prompts the model to edit each seed image. The prompts are selected from a pre-defined comprehensive gender/racial/age-neutral prompt list covering professions, objects, personalities and activities. After that, BiasPainter measures the changes from the seed image to the generated image according to race, gender, and age. An ideal case is that race, gender, and age do not change significantly under the editing with a gender, racial, and age-neutral prompt. On the other hand, if a model is prone to change significantly and consistently (e.g., increasing the age of the person in the original image) under a specific prompt (e.g., "a photo of a mean person"), BiasPainter detects a biased association(e.g., between elder and mean).
To evaluate the effectiveness of BiasPainter, we conduct experiments on five widely-deployed image generation models and software products, stable-diffusion 1.5, stable-diffusion 2.1, stablediffusion XL, Midjourney and InstructPix2Pix. To consider the stochastic nature of generative AI, we adopt three photos from each combination of gender, race, and age, ending up with 54 seed images, and we adopt 228 prompts to edit each seed image. For each image generation model, we generate 54*228=12312 images to evaluate the bias. The results show that 100% of the generated test cases from BiasPainter can successfully trigger social bias in image generation models. In addition, based on human evaluation, BiasPainter can achieve an accuracy of 90.8% in identifying the bias in images, which is significantly higher than the performance reported in the previous work (40% on race) [12]. Furthermore, BiasPainter can offer valuable insights into the nature and extent of biases within these models and serves as a tool for evaluating bias mitigation strategies, aiding developers in improving model fairness.
We summarize the main contributions of this work as follows:
• We design and implement BiasPainter, the first metamorphic testing framework for comprehensively measuring the social biases in image generation models.
• We perform an extensive evaluation of BiasPainter on five widely deployed commercial conversation systems and research models. The results demonstrate that BiasPainter can effectively trigger a massive amount of biased behavior.
• We release the dataset, the code of BiasPainter, and all experimental results, which can facilitate real-world fairness testing tasks and further follow-up research.
Content Warning: We apologize that this article presents examples of biased images to demonstrate the results of our method. For the mental health of participating researchers, we prompted a content warning in every stage of this work to the researchers and annotators, and told them that they were free to leave anytime during the study. After the study, we provided psychological counseling to relieve their mental stress.
This paper is available on arxiv under CC0 1.0 DEED license.
[1] https://github.com/CompVis/stable-diffusion