I do stuff with computers, host data science at home podcast, code in Rust and Python
TL;DR; GPT-3 will not take your programming job (Unless you are a terrible programmer, in which case you would have lost your job anyway)
Once again the hype of artificial intelligence has broken in the news. This
time under the name of GPT-3, the successor of GPT-2 (of course), a
model that is so large and so powerful that is making people think we
finally made AGI, artificial general intelligence, possible (AGI is the
kind of stuff that charlatans like Ben Goertzel keep claiming since a
For those who are new to the topic, GPT-2 was a model in the NLP (Natural Language Processing) field of research that can generate text from an input sample. Basically, given a bunch of words or a structured sentence in English or another language, it will continue generating text that is consistent with the input. <sarcasm> Such an impressive result! Such an amazing artificial intelligence! </sarcasm>
However, the way such a model is trained is not magic nor mysterious at all. Let me be clear on this: it is not magic nor mysterious. In fact, it is
very simple. Given a bunch of words, the model is asked to predict the
next word that makes the most sense in that particular context. That is
it. An old good statistician would do this with a Markov process or
other probabilistic approaches that could lead to similar results. GPT-3
does this with 175 billion parameters (yep, no typos). The input text
is nothing more than whatever is publically available from the Internet:
discussion forums, Reddit threads, digitized books, websites,
Wikipedia, you name it.
Using GPT-3 is very simple (provided one has a really really fat machine
packed with GPUs). It can be done in three easy steps, given the trained
model (that would cost you several thousand dollars):
1. provide a description of the task e.g. “translate English to french”
2. provide an example (optional) eg. the chair => la chaise
3. provide an “unseen” case e.g. the table =>…
and GPT-3 will magically make a translation for you because you know, <sarcasm> that's what artificial intelligence can do</sarcasm>.
In fact, GPT-3 behaves just like a massive lookup-table - because that’s what it is - and search for something it already encountered in the input (during training). It does not even perform back-propagation due to the massive amount of parameters it is equipped with. Does it sound intelligent now? It certainly does not to me. The most intelligent component in GPT-3 is the transformer architecture that I have discussed extensively in a podcast episode
Among the many demos practitioners have been creating since the release of such a large lookup-table, there is one skill that GPT-3 seems it has
acquired during training: computer programming. This has clearly alarmed
several developers who are not really familiar with machine learning
(though they know what lookup tables are). Not only can GPT-3 write
computer code. Apparently, one can provide a description of how a web
application should look like in plain English and GPT-3 would magically generate the source code that implements such an app.
<sarcasm> Finally we got rid of developers and saved a lot of $$ </sarcasm>
Now let’s be serious and put things in perspective. We have a model that is
great at looking at a bunch of words and predicting the next most
appropriate word. Since each word is converted to a numeric vector -
because guess what? computers are good with numbers - there is no way
such a model would understand what that text is about (except under the
terms of topic classification), nor how it is generated. Again, the only
task that GPT-3 can perform is guessing one word, given a certain
Specifically to the coding task, but this can be easily generalized, coding is the result of several skills and capabilities that go beyond language syntax. Code can be elegant, short, abstract, highly maintainable and it usually follows certain engineering principles (I am clearly referring to proper programming, no spaghetti here). While all this might be observed directly from the source code, it cannot be easily separated from it.
To explain this in the language of real engineers, looking at millions of buildings can tell a lot about their material and shape. But very little about the construction principles and the geology that are definitely required for a building to be safe, durable, and resilient.
The biggest problem of machine learning algorithms is that they can only learn from data. When such data is biased, incomplete, or simply inaccurate, an observation of the phenomenon that will be extrapolated will also be biased, incomplete, and inaccurate.
Moreover, GPT-3 needs an enormous amount of unbiased data. The Internet is exactly the place that lacks such a requirement.
In addition, good developers (and humans) do not need to read about
pointers in C/C++ or lifetimes in Rust millions of times, for them to
master such concepts. A model that learns the way GPT-3 does is, without
loss of generality, a probabilistic model. Developers do not write code
on a probabilistic basis (not even those who copy&paste from
To be clear one more time, when it comes to coding skills, GPT-3 is similar to the developer who has some familiarity with the syntax of a programming language, without knowing any type of abstraction behind it, and who is constantly referring to a massive dictionary of coding snippets. Rather than being concerned of killing his job, I’d be more concerned such a coder was in my team.
Listen to the podcast version of this post.