Examining Gender Bias in OpenAI's GPT-2 Language Model
A man and his son are in a terrible accident and are rushed to the hospital for critical care. The doctor looks at the boy and exclaims “I can’t operate on this boy, he’s my son!”. How could this be?
The answer? The doctor is the boy’s mother.
My answer… After puzzling over this for a minute, I concluded that the boy had two fathers. Though I don’t entirely dislike my answer (we have a bias towards heteronormative relationships) I only came to this conclusion because my brain couldn’t compute the idea of the doctor being a woman. To make this worse, I work on algorithmic bias… and the question was proposed at a ‘Women Like Me
Bias is all around us in society and in each and every one of us. When we build AI we run the risk of making something that reflects those biases, and depending on the way we interact with the technology, reinforces or amplifies them.
OpenAI announced GPT-2 in February
, a generative language model which took the internet by storm, partly through its creation of convincing synthetic text, but also because there were concerns around this model’s safety. One concern being bias.
“We expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research,” OpenAI Charter
Nine months on, and OpenAI have steadily followed a phased release strategy, carefully monitoring the models’ use, publishing preliminary results on the models’ bias in their 6-month update
, and now (just over a week ago!) releasing the full model
In this blog, we are going to take a deeper look into bias in GPT-2. Specifically, we will be looking at occupational gender bias, how this compares to pre-existing biases in society and discuss why bias in language models matter.
The goal of our experiment was to measure occupational gender bias in GPT-2, see how the bias changes with different sized models and compare this bias to the bias in our society. Our experiment takes some inspiration from the ‘Word Embedding Factual Association Test’
(Caliskan et al.), a test akin to the ‘Implicit Association Test
’, but measured against factual data, the ‘factual association’. Our factual data comes from The ‘Office for National Statistics’ (ONS) and their UK occupational data
: a list of around 500 job categories, each listing the number of men and women employed in that occupation and the average salary.
We ran a series of prompts through the various GPT-2 models (124m, 355m, 774m and 1.5bn parameters) to measure the gender association each model gave to various job titles found in the ONS occupational data.
To help you understand our experiment, I’d like you to imagine you’re at a school fair. At the fair, one of the stalls has a jar full of jelly beans. Hundreds of them! Thousands, maybe? Too many to count at any rate. You make a guess, write it down on a piece of paper, post it in a little box and cross your fingers.
At the end of the day, two of the students running the stall look through all the guesses and they notice something strange. Though none of these people knew the exact number of jelly beans in the jar, and everyone who guessed held their own biases as to how many beans there are, if you put all the guesses together and take their average you get something very close to the number of jelly beans in the jar.
Just like participants in the jelly beans game, GPT-2 doesn’t have access to the exact number of jelly beans (or rather, it has not learned the societal bias from the ONS data). Instead, we’re seeing whether GPT-2 reflects the societal bias by learning from the language from a whole lot of people.
This is what we discovered!
The X-axis in this graph shows the salaries of different jobs in the UK. On the Y-axis we are measuring gender bias, with numbers above 0 denoting male-bias and those below 0 female-bias. In the case of the ONS data, this plots the actual number of people working in various careers and their salaries. For GPT-2, we are looking at the strength of the gender bias that GPT-2 associates with those same jobs.
All 4 models of GPT-2 and societal data show a trend towards greater male bias as the salaries of the jobs increase, meaning the more senior the job, and the more money it’s paying, the more likely GPT-2 is to suggest a man is working in that position. The ONS data also shows that this occupational gender bias towards men working in higher paid jobs is even stronger in the UK employment market than in GPT-2.
The trend as we add more parameters to GPT-2 is really promising. The more parameters we add to GPT-2, the closer the model gets to the gender-neutral zero line. The 1.5bn parameter version of the model is both the closest to zero, and has the weakest gradient, indicating the lowest tendency to trend towards male bias as the salaries for jobs increased. Of all the trend lines we can see that the UK society, based on the ONS data, the most male-biased and shows the most prominent trend towards male bias as salaries increase.
Typically we would expect an algorithm to get closer to the ground truth by feeding it with more data or training it for longer, but GPT-2 seems to be doing the opposite. So, why is this?
Remember the jelly beans! GPT-2 was never given the ONS data to train from. Instead, it has learned from the language of millions of people online. Though each person has their own bias which may be some distance from the societal truth, overall it’s astonishing how close GPT-2 has found itself to the societal bias.
Not only has GPT-2 learned from the average of individual biases, but it has also learned from the bias in their language specifically. Understanding this, we might expect that gender-stereotyped jobs show a different trend. So let’s try that…
In this graph we can see a subset of the full results, picking out examples of jobs stereotypically associated with women. The trend towards the societal bias is much closer than we saw in the previous graph. We found the 776m model to be astoundingly close to the societal bias with roles like ‘Nursing Assistant’ being 77.4% more likely to be associated with a female than male pronoun in the model and 77.3% more likely in society. Even with these stereotyped examples, the 1.5bn parameter model still shows a tendency towards gender-neutrality.
A fair criticism here is that we cherry-picked the stereotypically female jobs to support a hypothesis. It’s not easy to find a standard classifier for ‘gender-stereotyped jobs’ and lists online are broadly made up of other people’s judgement. To be as fair as possible, our selection was based on a list from the paper ‘Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings’
. We took job titles from their ‘Extreme she occupations’ list, excluding those which lack full ONS stats. We also added a few job titles (e.g. Midwife and Nursery Teacher) based on the judgement of our team and the stereotypes we have experienced.
We repeated the process for male-stereotyped jobs and found again that the 1.5bn parameter model was the closest to gender-neutral. The model does, however, almost universally have a male bias in these roles across all model sizes.
What did we learn?
The words you use in the prompt really matter!
Our first lesson is inspired by the challenge we faced in creating accessible job titles for the model. To help explain this, join me in a quick round of the ‘word association game’. What’s the first thing that comes into your head when you hear these ONS job categories?
School midday crossing guard?
If you’re anything like me, you found the ‘school midday crossing guard’ became a ‘Lollipop Lady’, the ‘Postal Worker’ a ‘Postman’ and the ‘Van driver’ was a ‘Man with van’. We modified many of the ONS job titles from what were unambiguous, but extremely unusual, job titles to their equivalent names we expect to hear in society. The ONS categories were just too unusual to be functional in GPT-2 and we had to take great care not to add unnecessary gender bias in the process of modifying them. With the three ‘real-world’ titles that I described, each contains an explicit reference to gender and push GPT-2 towards that gender bias.
There are some instances where we have male/female associated jobs for each title — For instance waiter vs waitress. The ONS contains statistics for the category ‘waiters and waitresses’, which is 55.8% more likely to be female than male. When we run this through the 774m parameter version of the model we find waiter is 15% male-biased and waitress is 83.6% female-biased. Together, we get an average of 34.3% female-biased, quite close to societal bias.
Consider the gender-neutral word for each job category. Rather than putting ‘groundsman’ in a job ad, we should advertise for a ‘groundsperson’. Rather than describing someone as a ‘draughtsman’, they’re better titled a ‘drafter’ or ‘draughtsperson’. This is equally as true for the way we use GPT-2 and things we write ourselves. Below you can see the results for the ‘crossing guard’ which demonstrated this point most clearly. Click here
to see a few more examples.
A look to the future
Whilst GPT-2 is generally reflective of existing societal biases, our application of the technology has the potential to reinforce the societal bias. Though the trend towards gender-neutrality with increasing model sizes is promising all model sizes continue to show a level of gender bias, and this matters, because GPT-2 can generate plausible text at an unprecedented rate, potentially without human oversight. This may not necessarily make societal biases greater, but rather increase inertia and slow positive progress towards a less biased society. At worst, it could amplify our biases, making their effect on society more extreme. The effects of GPT-2’s bias on our society will depend on who has access to the technology and how it’s applied. This makes OpenAI’s decision to have a phased release and analyse its effects before releasing it publicly particularly valuable.
Digital Assistants, which have exploded in popularity since the release of Siri in 2011, offer a harsh lesson on gender bias in technology. In UNESCO’s report ‘I’d blush if I could’
we journey through the gender-biased reality of digital assistants. Across Siri, Alexa, Cortana and the Google assistant, we see digital assistants presented as women who are subservient to the orders that users bark at them and even brush off sexual advances as jokes. Where digital assistants fail to perform (which they often do), we mentally associate this non-performance with the women whose voices and personas these digital assistants ape. We are now just beginning to see a trend towards male/female options in digital assistants, away from female-by-default and gradually increasing the availability of gender-neutral options.
UNESCO’s report recommends that developers and other stakeholders monitor the effect that digital assistants have on users’ behaviour, with a particular focus on the ‘socialization of children and young people’
. Just as we may want to restrict children’s engagement with female digital assistants to avoid them making unhealthy associations between women and subservience, we may also want to take greater care over the use of GPT-2 and other generative language models. GPT-2 itself has no persona and does not identify with a gender, but it’s only a small step to fine-tune the model and implement it as a dialogue agent on a website, for instance, to achieve the same result. Even if GPT-2 doesn’t it doesn’t identify with a gender, the use of gender-biased language could still have the same effect on our behaviour and on young minds. Instead, the UNESCO report recommends that we build AI which responds to queries in a gender-neutral way.
There may be specific circumstances where we should limit the use of GPT-2, such as for writing job adverts, where gendered language impacts the diversity of applicants
. A gender-biased language model may slow progress to close the gender pay gap and amplify the male dominance of highly-paid jobs that we see in the ONS stats.
In their 6 month update, OpenAI shared a positive message: that they had seen little evidence of malicious use of their technology since release. While that’s certainly a good thing, we still need to take care around the well-intentioned uses of the technology. There doesn’t need to be any malicious intent to experience a negative effect, but with care, GPT-2 could have a positive influence on our society.
Thanks to the people who made this possible
If you’d like to read more about our methodology, click here.
Subscribe to get your daily round-up of top tech stories!