I write marginally better than GPT-2 (small model only)
On February the 14th 2019 Open AI posted their peculiar love-letter to the AI community. They shared a 21-minute long blog talking about their new language model named GPT-2, examples of the text it had generated, and a slight warning. The blog ends with a series of possible policy implications and a release strategy.
“…we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research” OpenAI Charter
While we have grown accustomed to OpenAI sharing their full code bases alongside announcements, OpenAI is committed to making AI safe. On this occasion, releasing the full code was deemed unsafe, citing concerns around impersonation, misleading news, fake content, and spam/phishing attacks. As a compromise, OpenAI shared a small model with us. While less impressive than the full GPT-2 model, it did give us something to test.
So that’s exactly what I did! Last week, I set up the small model of GPT-2 on my laptop to run a few experiments.
First, for a bit of fun, I thought I’d test its skill at creative writing. I didn’t hold great expectations with only the small model to hand, but I thought I could learn something about the capabilities of the model, and perhaps start a few interesting conversations about technology while I was at it.
I joined a popular online writing forum with an account named GPT2 and wrote a short disclaimer, which said;
** This is computer generated text created using the OpenAI GPT-2 ‘small model’. The full model is not currently available to the public due to safety concerns (e.g. fake news and impersonation). I am not affiliated with OpenAI. Click this link to find out more >> https://openai.com/blog/better-language-models/ **
The setup seemed perfect. I had a ready-made prompt to feed into GPT-2, and the model’s output is the exact length expected for submissions. I could even get feedback from other users on the quality of the submission. I chose a few specific blurbs and fed them into GPT-2 as a prompt, running the model multiple times before it created a plausible output.
I pasted the story into the platform with my disclaimer at the top, excited to see what sort of questions I would receive from the community. I hit enter, and within seconds.
‘You have been banned.’
I was confused. I had been abundantly transparent about my use of computer-generated text and had not attempted to submit a large number of posts, just one. This was where I learned my first lesson.
I had made a strong conscious effort to be as transparent as possible. I didn’t want to deceive anyone into believing this was anything other than computer generated text. Far from it, I wanted people to know it was created by GPT-2 to engage them in a conversation around AI safety. I naively thought I would avoid a negative response through my honesty, but that was not enough for this community.
I messaged the moderators. This is the reply I received;
This is how the conversation began but know that it ended happily!
Shortly after the release of GPT-2 I saw two primary reactions to the limited release. There were parts of the mainstream media dusting off their favourite terminator photos, while some people in the AI community took the opinion that it was a marketing ploy — because any technology too dangerous to release must be very impressive indeed.
I only had access to the severely limited ‘small model’ of GPT-2. You need only use it for a few minutes to know just how far it is from being a terminator style risk, yet it still highlighted the need for thought through release strategy. Poor implementations of technology can have a negative impact on public sentiment, and in this instance, it was my choice of forum and application of the technology that raised the alarm.
It’s possible that GPT-2 could write a charming story, but it won’t hold the same place in our hearts if it’s not both charming and authentic. Max Tegmark makes this point in Life 3.0, suggesting that AI could create new drugs or virtual experiences for us in a world where there are no jobs left for humans. These drugs could allow us to feel the same kind of achievement that we would get from winning a Nobel prize. But it’d be artificial. Tegmark argues that no matter how real it feels, or how addictive the adrenaline rush is, knowing that you’ve not actually put in the groundwork and knowing that you’ve effectively cheated your way to that achievement will mean it’s never the same.
“Let’s say it produces great work”
For whatever reason, people desire the ‘real product’ even if it’s functionally worse in every way than an artificial version. Some people insist on putting ivory keytops on a piano because it’s the real thing — even though they go yellow, break easily and rely on a material harmful to animals. The plastic alternative is stronger and longer lasting, but it’s not the real thing. As the message (from the forum) says, even if ‘it produces great work’, possibly something functionally better than any story a human could have written, you don’t have the authentic product of ‘real people, who put time and effort into writing things.’
The message also highlights two things — a human submission takes effort and creativity to produce, and that matters, even if the actual output is functionally no better than computer generated text. I think I agree. I have always found that a great book means so much more to me when I discover the story behind that — the tale of the writers own conscious experience that led them to create the work.
Ray Bradbury’s magnum opus, Fahrenheit 451 is a brilliant book in itself, but it was made a little bit more special to me by the story behind its creation. Bradbury had a young child when the novel was conceived and couldn’t find a quiet place at home to write. He happened across an underground room full of typewriters hired at 10c an hour. Bradbury wrote the whole book in that room, surrounded by others, typing things he knew nothing about. Nine days and $9.80 later, we had Fahrenheit 451.
This doesn’t only apply to generated text. I recently spent far too much money importing a vinyl copy of Vulfpeck’s ‘Sleepify’ album. A record with 10 evenly spaced tracks, with completely smooth grooves. Why? It’s just pure silence! While this is an awful record based on its musical merit, and even the most basic music generation algorithm could have created something better, I love it for its story.
The band Vulfpeck put this album on Spotify in 2014 and asked their fans to play it overnight while they slept. After about 2 months the album was pulled from Spotify, but not before the band made a little over $20,000 in royalty payments, which they used to run the’ Sleepify Tour’ entirely for free.
As an aside, I think an AI like GPT-2 could also do a great job of creating a charming back-story behind the story. To the earlier point though, if it didn’t actually happen and if there wasn’t conscious human effort involved it lacks authenticity. As soon as I know that, it won’t mean the same thing to me.
One thing that came out of my conversation with the moderators that I’d not even considered was that it’s not all about the people reading the content, sometimes there’s more pleasure and personal development to be gained from writing, and that’s something the forum actively wanted to promote.
In Netflix’s new show ‘After Life’, (no real spoilers!) the main character, Tony, works for a local newspaper. Throughout the series, Tony points fun at the newspaper, from the mundane selection of stories that their town has to report on to the man delivering their papers, who it turns out just dumps them in a skip nearby. Nobody actually reads the paper, and Tony takes that to mean their work is meaningless up until the very end of the series. Tony realises that it doesn’t matter who reads the paper, or if anyone reads it at all. What’s important instead is being in the paper. Everyone should have the chance to have their story heard, no matter how mundane it might sound to others. If it makes them feel special and part of something bigger, even just for a moment, then it’s doing good.
I’ve been writing for a few years now, and aside from the 18 blogs I now have on Medium, I have a vast amount of half-written thoughts, mostly garbage, and an ever-growing list of concepts that I’d like to expand on one day. Sometimes my writing is part of a bigger mission, to communicate around the uses, safety and societal impact of AI to a wide audience, and at those times I do care that my writing is read, and even better commented on and discussed. At other times, I use it as a tool to get my thoughts in order — to know the narrative and message behind a presentation that I need to make or a proposal that I need to submit. Sometimes, I write just because it’s fun.
If AI is capable of writing faster and better than humans, and it very much seems like it can (no doubt within some narrowing parameters)- it doesn’t mean that we can’t keep writing for these reasons. AI might capture the mindshare of readers, but I can still write for pleasure even if nobody is reading. However, I think it’ll mean people write a whole lot less.
I began writing because I was asked to for work. It was a chore at the start, difficult, time-consuming and something I’d only apply myself to because there was a real and definite need. Gradually it became easier, until suddenly I found myself enjoying it. If it weren’t for that concrete demand years ago, I don’t know if I ever would have begun, and if I’d be here now writing for pleasure.
It’s clear that language models like GPT-2 can have a positive impact on society. OpenAI has identified a handful of examples, like better speech recognition systems, more capable dialogue agents, writing assistants and unsupervised translation between languages.
None of this will be possible though unless we get the release strategy right, and have robust safety processes and policy to support it. Creative writing might not be the right application, so we need to ensure we identify the applications that society can agree are good uses of these language models. Codifying these applications that we approve and those that we want to protect in policy will help others to make the right decisions. Good communication will ensure that people understand what’s being used, why, and keep them onside. Robust security will prevent nefarious parties from circumventing policy and best practice.
It’s important to note that these lessons are anecdotal, derived from a single interaction. No sound policy is based on anecdotal evidence, but rather academic research, drawing in a wide range of opinions with an unbiased methodology before boiling down values into concrete rules for everyone to follow.
This isn’t a conclusion.
This story is just beginning.