289 reads

First Talos, Now GPT-3: A Deep Dive

by shishirNovember 8th, 2020

Too Long; Didn't Read

First Talos, Now GPT-3: A Deep Dive - a mini-story from the Percy Jackson series. The story contains themes of automation and machine sentience was supposedly written in the 700 BC. The Generative Pre-trained Transformer 3 is the third installment to OpenAI’s increasingly impressive line of Natural Language Processing (NLP) AI models. It basically can do is generate text based on the limited prompt given by the user. It is especially unique because it can generate a response to a prompt it may have never seen before with little to no context.

People Mentioned

Companies Mentioned

featured image - First Talos, Now GPT-3: A Deep Dive

The island Crete in Greek mythology is strongly associated with the ancient Greek gods. It is the backdrop to many famous Greek myths, my favourite being Talos.

While there are conflicting stories as to how he was created, the popular theory was that Hephaestus (god of Blacksmiths and fire) was busy in his forge inventing a new defender for the island. His goal was to build a giant man whose insurmountable power could replace feeble mortal soldiers. Made with beaming bronze, the giant man possessed previously unimaginable strength and was powered with the blood of the gods- ichor. The only indication of this was a single vein that ran from his neck to his ankle.

He had a simple task: protecting Crete. This bronze defender would, three times a day, circuit the boundary of the island looking for pesky invaders. When he found some, he would hurl boulders with ease, sinking their ill-fated ships. Even then, there were sometimes some intruders who would bypass this flurry of boulders. For those, a worse death lie ahead. Heating his metal body up, he would embrace these trespassers, quite literally killing them with kindness!

He was the model guardian, never tiring and always consistent; able to replace the legions of human soldiers who had previously done a sub-par job of keeping the island safe. However internally, he yearned for more.

This was until other Greek heroes came into the picture. Jason, Medea and the other Argonauts had just completed yet another quest and were looking for a place to rest. Attempting to find the relief of a safe cove on Crete, they invoked the defence mechanisms of Talos. While the other Argonauts cowered in fear, Jason veered the boat away from boulders while Medea came up with a plan. Once they cleared the first round of defence, Talos began to heat himself up. Then Medea, the witch, ventured onto the land attempting to coax Talos. Using her honeyed voice, she offered him immortality in exchange for safe passage. Somehow this resonated deeply in his core. Accepting, Talos allowed Medea to chant the necessary invocations. This proved to be a sly distraction as Jason pulled the screw from Talos’ ankle as the ichor flowed out as molten lead, draining his power source. Having attacked his only blindspot, the robot collapsed with a thunderous crash.

So you might be wondering how a mini-story from the Percy Jackson series figures onto a tech blog but the links between technology, imagination and AI are much older than we can imagine. In fact, this story which contains themes of automation and machine sentience was supposedly written in the 700 BCE!

Thus proving that we haven’t just begun thinking about AI. While the headlines may be currently dominated with the release of GPT-3, I believe it is important to recognize its roots.

For those of you that may be living under a rock or simply don’t care for tech news, OpenAI – an AI research firm with backing from Silicon Valley big shots (like Sam Altman and Reid Hoffman)- announced the release of GPT-3. This is considered to be a deeply consequential milestone, and also one that is unique in the way it’s being delivered.

Lesser known as the Generative Pre-trained Transformer 3, GPT-3 is the third installment to OpenAI’s increasingly impressive line of Natural Language Processing (NLP) AI models. Renowned for its zero-shot and few-shot learning power, what this basically can do is generate text based on the limited prompt given by the user. It is especially unique because it can generate a response to a prompt it may have never seen before with little to no context, while also maintaining an impressive quality of results.

The various use cases of GPT-3 is seriously impressive and one who’s nuance is not easy to condense in a few lines. However I’ll give it a try:

Given a simple single-line prompt specifying either the topic, tone or style of author (you can even suggest a specific individual) or a combination of all three, it can generate complete written pieces. This means that the barrier to “good” and (wherever applicable) accurate creative writing is now lowered. You could apply this to scripts, novels, speeches, essays, articles, emails and any such task that requires creative writing. Thus it has the potential to disrupt the field of journalism, thought-leadership, film-making and more. Now the problem with such a powerful tool is that if it is easily accessible, it could mean that the barrier to creating believable misinformation is also lowered as these programs can convincingly copy the style of famous individuals.GPT-3 isn’t limited to just creative writing, it can also provide unique philosophical revelations ranging from what is god to the meaning of life.

2. GPT-3 isn’t limited to just creative writing, it can also provide unique philosophical revelations ranging from what is god to the meaning of life.

3. Programming jobs aren’t safe either, there are already demos which use GPT-3 to dynamically create apps from a given prompt. A key feature of GPT-3 is not just is ability to build on and use its preexisting knowledge bank but also learn on the go i.e meta learning. This is an yet another impressive feat.

However I would like to reiterate that this isn’t all that GPT-3 can do. These are just glimpses of what people are posting online and keep in mind that these are only a select few who have access to it.

(GPT-3 is currently in its private beta phase as OpenAI wants to control the potential misuse of the API. )

If you would like a slightly more exhaustive list of the other applications of GPT-3, check out this article I wrote. I try to shine light on the other resources and topics that I couldn’t cover in sufficient detail in this article.

Comprehending the scale

To understand why GPT-3 is such a big deal, here’s a graph of the different NLP programs before GPT-3 given in terms of the number of parameters the model processes:

Pre GPT-3 era
This figure was adapted from an image published in DistilBERT.

Now here’s a look after:

Post GPT-3 era

The number of parameters is indicative of the power of the model and GPT-3 boasts 175 billion such parameters. The next closest model is from Microsoft which, though impressive in its own right, has only 17 billion parameters. Given the power GPT-3 has over other models, it is clear to see why this is different and the scale at which it can disrupt jobs.

While I may sound like a broken record player, kind of ironic given how far technology has come, I cannot stress the wide-spread implications this AI model can have. It will affect every field and you can bet on it. Whether you’re an author with oddly specific knowledge about Greek mythology to even the programmers who made this model, GPT-3 can do exactly what you can, if not better, at least at an unimaginably fast rate. It has the potential to carry out a significant portion of human jobs and the frightening bit is that it’s just getting started.

History of NLP

For some context here’s a timeline of the key developments in the field of NLP:

February 2019: GPT-2 (OpenAI)

Built using 1.5 billion parameters, it was a large and powerful model in its own right. This was considered the biggest and most powerful model for a while. Like the researchers predicted, there were significant ethical concerns that delayed public delivery.

July 2019: RoBERTa (Facebook)

To improve data tagging, Facebook created RoBERTa which would be able to take into account the context of the query, instead of treating it as a bag of words. This meant it could produce more accurate results as it would avoid randomly eliminating words (the old method).

October 2019: BERT (Google)

Built using the same principle as Facebook’s RoBERTa, it claimed to exponentially improve the quality of search results. Some reports suggest it affected 10% of all Google searches. This marked an important milestone in context-awareness in NLP.

February 2020: Turing NLG (Microsoft)

Dethroning previous champion GPT-2, Microsoft’s predictive text algorithm was built using 17 billion parameters.

June 2020: GPT-3 (OpenAI)

Lo and behold, OpenAI’s astronomically encompassing text-predictive engine with 175 billion parameters. This marked an improvement from its predecessor (GPT-2) in four orders of magnitude and three orders of magnitude difference in superiority over Turing NLG.

The future

Many machine learning scientists and AI researchers have hailed that once these AI models get to the level of the human brain they can finally be considered intelligent.

It is estimated that our human brain has over a trillion synapses and each can be considered analog to a parameter. This highlights how far we are yet to move in developing intelligent AI. This, at least for me, gives me a newfound appreciation for the human brain and revelation over the nascency of this field.

Under the hood

So you may ask, how does a machine with not even 0.02% of our thinking capacity compete with our “intellectual” abilities? The answer lies in the data it is fed. 60% of the data fed came from the Common Crawl dataset – an open-source crawler which has inched the corners of the internet getting information from Al Jazeera to Alex Jones. To complement this, the OpenAI creators trained the model also with hand-picked resources from Wikipedia and historically relevant books. So while it has picked up very reputable and necessary information, in this process it has also picked up the biases, bigotry, misogyny, and other things toxic from the internet. In somewhat relieving news, the researchers recognize this issue and have even reported the results of problematic bias.

To elucidate why this is a problem I would like to look at two examples. Say you identify as non-binary and are seeking healthcare advice from a GPT-3 powered medical diagnosis chat-bot. Now since it is also fed data from the spiteful corners of Reddit, it could generate a response that isn’t “professional”. This could mean, it may prompt offensive comments about why they shouldn’t exist and even deny giving proper medical advice.

Similarly, imagine you are a minority who is using a GPT-3 powered program to appeal a speeding ticket. There is already sufficient evidence to suggest that minorities are less likely to win the appeal. Now, this same data is fed to GPT-3 and the program is “corrupted” into not advising you to take legal action because you have a lesser likelihood of success, while for another individual it may suggest a different course of action. Now, do you see the problem with this?

While there isn’t anything inherently wrong, biased, or immoral about these programs itself, its decisions are simply colored by the data it is fed. This sadly will only mean that GPT-3 and other real-world data-powered models will only accentuate pre-existing socio-economic problems that the data is unknowingly recording.

There are also other concerns with models such as this. Though not significant, they pose similarly discouraging questions about further development.

The first being cost. It is estimated to have cost $4.6 million to train GPT-3. This simply isn’t sustainable and especially isn’t for smaller startups. As I have mentioned before, this barrier could mean that Big Tech extends the lead they have on AI development.

Another concern specific to GPT-3 is that it isn’t distinctly unique from previous models. It contains the same underlying architecture as its predecessor (GPT-2). This is like swapping out the bronze for titanium but still leaving in the ichor. What I mean by this is that GPT-3 isn’t as innovative as headlines make it out to be (even the OpenAI founder has said so) rather it is a strong testament to the power of brute force and incremental improvements.

Bottom Line

Even if GPT-3 isn’t distinctively different, it's a vastly superior model that is able to power previously unimaginable tasks. It marks a defining moment in AI development. I am also glad that with this, relevant questions about its future are being asked.

The lack of “innovation” is the least worrying claim. While costs are a concern, even this isn’t the largest problem. For me (and many in Silicon Valley) how GPT-3 models existing (socio-economic) problems through tainted data is the cause for greatest concern. I am worried that while we look at technology as an impartial medium, we may cast a blind eye to the unjust outcomes it may model from historically problematic data. There is still a lot of work that needs to be done in mitigating the potential misuse of such programs. However, I think that OpenAI must be given some credit for taken steps in the right direction. Not only have they delivered an impressive product but also done so in a (seemingly) safe way. I am aligned with their vision of delivering ethical and just AI. There aren’t many ideas as to how we can practically achieve this but for now, a conversation on this topic has begun.

While we need to make strides on that front, we need to immediately grapple with the implications programs such as these will have on jobs. These technologies will leave millions without a job and many scrambling in a new world with the promise of transformed jobs. This transitioning window will cause great uncertainty and significant discomfort for many. All of these point towards the need for UBI in a world where most human jobs become automatable. It is imperative that we avoid the unfortunate fate of the forgotten human soldiers of Crete.