Ben Mann


Why AI research may be accelerating faster than experts realize

For the last nine months I’ve immersed myself in AI research. What I’ve learned has changed how I see the world and the role I want to play in it. For example, when I see people developing habits or learning new skills, I think of reinforcement learning strategies I’ve read about. When I consider what problems I want to work on, they are AI problems. It is a paradigm shift from telling the computer what to do, to telling the computer what you want.

The AI hype train is operating at full steam. Investment in AI startups increased 8x from 2012 to 2016. Many AI researchers downplay progress because our current techniques are much more fragile than the lay public realize. It takes a deep knowledge of a large set of mathematical underpinnings and tricks to get even the simplest networks training.

That said, I want to share some of the recent breakthroughs. I was amazed when I read about them. They changed my estimate of how quickly AI is progressing.

Biological hints for neural net capability

Before I get into the AI research I want to cover some supporting biology research I came across while researching this post. Trends in the speed at which biological neural nets operate and the effect of increasing their size suggest there may be room for improvement. If there is such a thing as peak intelligence, it’s unlikely we’ve hit it already, either for AI in the form of artificial neural networks or biological neural networks.

Biological evolution is slow compared to our ability to simulate, and involves constraints like energy efficiency that computers don’t have to deal with. For example, if our brains and heads were bigger, we wouldn’t be able to fit through the birth canal. Neurons can’t fire very rapidly because they depend on neurotransmitter concentrations that need to be refreshed. Computer circuits can operate ~10 million times faster than neurons.

Research done this year shows that intelligence is strongly correlated with absolute number of neurons in the cortical areas of the brain. Density of these neurons also appears to be important, which makes sense since shorter connections means faster communication. The same researcher found humans likely have more neurons than almost any other animal, despite differences in brain size. Whales are an exception, but their brains are much less dense. It’s easy to imagine more and denser neurons could make an organism even smarter than a human.

The biological correlation between number of neurons and capability appears to be roughly true for AI as well. New software techniques are allowing us to train bigger and bigger neural networks which achieve state of the art results. New hardware will make that easier and more effective.

New hardware will be much better for running AI

I predict that a shift to new more brain-like computational architectures will push us at least 10 years into the future over traditional methods of increasing processing power. Graphcore raised $60M this year to build such hardware.

Although GPUs are great for running AI and each computation is much faster than biological neurons, we still have a long way to go before we match the computational power of the brain. Modern GPUs can compute massively more per device than CPUs, so we can process a whole image at once more like the brain does. But they are not nearly as parallel as the brain. Also, not every neuron is connected to every other neuron. This is called sparsity and by default GPUs can’t handle sparsity well. New research suggests using fancy tricks to represent sparsity improves performance, but few researchers have the skills to do so. New hardware may make sparsity easier to program.

Companies are already putting more resources into specialized hardware. Google created the first widely used AI-specific chip, but it’s still designed to fit on a standard datacenter rack rather than a very large box customized for computational density. The new systems like Graphcore’s will be unlike modern supercomputers. When you run a program, it will not have to hit the network to exchange data. The whole thing will appear to be one machine, and it will excel at parallelism and sparsity.

The capabilities of a single machine might seem irrelevant given our excellent datacenter technology. If we can already fan out computations to thousands of machines, will having a single powerful machine really help? Talking to my PhD friends doing AI research, I was astonished to hear how rarely they used more than one computer per experiment. They didn’t care if an experiment took a few days to run. Multiple computers were usually only used for tuning. My impression from reading over a hundred recent AI papers suggests this is broadly true across universities. According to the researchers I talked to, this preference for single machines comes from needing to iterate rapidly on a design even more than cost considerations. You’re not sure your idea will work at all, and there’s no standard way of effortlessly scaling up a single experiment. Fanning out computation still has too much programming overhead.

If I learned anything over the last 6 years as an engineer, it’s that the pace of progress in any domain is limited by iteration speed (see Gossamer Condor/Albatross). Once we have a single machine that allows training something in minutes instead of days, the pace of progress will rapidly increase.

Looking at the brain to find neglected research areas

By looking at artificial analogues for each part of our brain, perhaps we can get closer to estimating how close we are to a more brain-like intelligence. In my oversimplified model of the brain, we have the following components: audio, visual, motor, and somatosensory cortices; memory in hippocampus; emotions in limbic system and amygdala; planning, reasoning, and language processing in frontal lobe; and control of biological systems in the brain stem.

In audio and visual tasks, AI systems are doing well. Object recognition, speech recognition and machine translation are superhuman already.

For AI, let’s ignore the limbic system and biological systems, except insofar as they’re necessary to generate rewards. I’m guessing these systems came out of uniquely evolution-driven reward structures such as the need to reproduce, get and process calories, etc and may not be instructive for research purposes.

AI performance in sensory and motor control is worse than human when you consider fine-grained manipulation tasks, but if you consider self-driving cars to require sensor input and motor output, Tesla is already 40% safer than human drivers on highways. From riding in self driving cars myself, I attest they’re already better at driving than I am.

Memory is where AI fails hardest. I’ll discuss that in the next section.

We’re already seeing combinations of different modalities improving performance in some tasks, such as audio plus visual for machine translation. My theory is that if you take all these subsystems which are capable of approaching human performance and you strap them together, we will get closer to something capable of generalizing to new tasks more easily, which is one measure of intelligence.

Too few people working on AI memory methods

Even the best AI agents forget their context after only a few seconds. Few researchers are focusing on this now, and there aren’t good reasons for this. This is a solvable problem. It’s just a matter of more effort being put in.

Current state of the art neural net component LSTM is capable of keeping context for 30–100 timesteps. In machine translation, that gives a few sentences. In playing video games, that gives a few seconds. That’s not nearly enough to learn more complicated strategies, and severely limits the learning capacity of AI agents.

In addition, when we train a modern neural net on one problem such as an Atari game, then we train it on a different game, it’ll forget how to play the first one. This is called the catastrophic forgetting problem.

Memory methods such as Deepmind’s Differential Neural Computer and Neural Episodic Control allow the system to write data to memory that may be preserved over arbitrary durations, sort of like our hippocampus. Very few researchers are focused on this area, but solving it will give us vastly more capable agents. We only need one big breakthrough here to convince more researchers to look here, and then progress will accelerate.

I asked a researcher why few people in the research community are working on memory methods. He told me that they’re new, that researchers tend to have specialties, and that no one was willing to pause their research to learn a new branch of AI. He likes them and hopes to have time to work on them in the future, but it’s just not a priority right now.

Representing meaning

When we think about something, before we turn it into words, how is that meaning represented in our heads? What is our internal monologue, which seems so central to our sentience? How could we possibly teach a machine to create a similar sense of meaning? In this section I’ll present some research that suggests we might be close to understanding how to create an internal monologue for an AI system.

Machine translation involves representing meaning. Words are just proxies for our actual thoughts, so when we translate, it’s easy to lose the nuance. If a machine is to translate well, it needs to first understand the meaning, and then be able to construct the same meaning in a different language.

Traditionally machine translation has depended on massive datasets of text in one language and the corresponding translations in a target language. For language pairs without this training data, Google translated to English first, and then into the target language, so English served as a direct representation of semantic meaning, or interlingua.

Recently Google rewrote their entire machine translation stack and published their methods in Google Neural Machine Translation. The neural network takes the input sentence e.g, “知识就是力量”, and using an encoder turns it into high dimensional vector. From this numerical representation, it then goes through a decoder into the target language, producing e.g., “knowledge is power.” If I lost you in this paragraph, the point is that the new system found a numerical way to represent meaning without humans doing any direct data entry.

Because of this language-independent numerical representation, the system can translate directly between any pairs of languages, including those for which there is no training data, such as Korean and Swahili.

On top of that, this numerical representation has some interesting properties. If you take the vector for “good” and you perturb it by a little, the system will output a synonym such as “nice”. You can also add or subtract words and get something meaningful.

I was blown away that it is possible to create a language-independent representation of meaning in a production system today. Perhaps this suggests our internal monologues are not special, and we could get a machine to have such an internal monologue if we knew why it would be useful and how to use it.

Sentiment: emergent property or uniquely human?

Sentiment, which is how positive or negative a sentence or phrase is, is extremely hard to represent with a set of formal rules. For example, “very good” has positive sentiment, but “not very good” has negative sentiment. “Not good” has strongly negative sentiment. The same word’s sentiment is very different depending on modifiers, and modifiers of those modifiers. This is a simple example, but it gets complicated fast. The Stanford Sentiment Treebank represented the state of the art in trying to represent sentiment, and it involved generating detailed parse trees, with individual words and phrases painstakingly tagged.

If you ask even a child, they’d probably be able to tell you whether the sentiment of a given sentence is positive or negative. So from this it seems humans are easily able to understand sentiment, while machines can’t. Perhaps that makes it a special human thing.

OpenAI’s Alec Radford et al. recently published Learning to Generate Reviews and Discover Sentiment. Alec trained a neural net to predict the next character of an Amazon review given the characters seen so far. For example, input “This camera wa” would lead to output “s”. After training on every Amazon review ever, you can feed the system a random character, then feed its output back into itself, etc until you have entirely hallucinated sentences. These sentences tend to lose context after a bit, but they’re basically coherent and believable.

The interesting part comes when you chop off the top layer of the model and train linear regression against a handful of sentiment examples from Standford Sentiment Treebank. The review model is able to classify sentiment extremely accurately, competitive with state of the art, despite not being built for that purpose. Alec et al. wondered what was going on under the hood, and found that one neuron out of ~4000 had simply discovered sentiment. All the weight of the linear regression was just taking the output of this neuron.

The amazing thing here is that sentiment happened to be a useful way to compress the Amazon reviews. Rather than being something mushy and complicatedly human, it was simply an emergent property of the data. I believe there are many more concepts we think of as unique to humans, but are just efficient ways of compressing our experience.

Similar to being able to represent semantics in machine translation, this ability to represent a mushy human concept like sentiment today surprised me. I would have predicted many more years of research before approaching this result.


We appear to be far from robots that can behave like humans, both hardware and software. Simply stacking colored blocks is state of the art now. Factors that will cause research to rapidly advance here:

  • Research robots are very expensive due to lack of scale
  • Researchers were able to transfer simulated learning to real robots only very recently, so previous robotics research was limited to real time and real robots.
  • It hasn’t yet been proven that we can cheaply and safely put robots in people’s home

Now that we can learn from simulation, we’re only a few breakthroughs away from mass market, cheap, ubiquitous robots with human-like manipulation capabilities. At scale, the hardware would probably cost less than $30,000, which is much cheaper than a low-skill human laborer.

The reason robotics may be important to AI research pace is that living in the real world could be important for gaining intelligence. If the Google Translate and OpenAI sentiment examples are any indication, the way we think may be a way of compressing and predicting our environment. Robots powered by a constantly learning AI may get us there faster. Market pressure will ensure this happens.

Hierarchical neural networks

So far I’ve covered the major components that lead me to believe the pace of research is accelerating, but independent of each other these components don’t appear to be capable of long term planning. Research in hierarchical nets show we can tie together different specialized networks to serve more complex goals. Jacob Andreas recently showed that we don’t even have to specify what each of the subnetworks should do as long as we use the right architecture and training procedure.

When we’re able to take all these human-level or nearly human-level components and glue them together, that may lead to a leap forward in AI capabilities.

Experts’ AI timeline

A recent survey of when AI researchers think AI will achieve human-level intelligence puts average 50% confidence estimate at 2062, but with an extremely high variance (see graph above). This and other conflicting evidence from the survey says to me that they haven’t thought hard enough about it, or aren’t seeing the acceleration in the pace of research. One of Clarke’s three laws is

When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

Why does the elderly scientist say a thing is impossible? I think it’s because they’ve specialized too far, and can’t remember the days when anything was possible with the right context and effort. With new AI breakthroughs almost weekly, it’s hard to imagine their estimates being correct. They may not be elderly scientists and they may not say it’s impossible, but I’m not sure they’ve thought hard enough about the potential.

Before immersion in recent research, my 50% confidence estimate of when we’d be able to create human-level AI was 50–100 years. Now, I think 10–20 years is more likely. That’s soon enough that I decided to devote more serious thought to this technology. I think it would be a waste to spend 10 years working on something that will become obsolete when AI becomes sufficiently capable.


I hope the above research is as fascinating and surprising to you as it was to me. It makes me think the daily increase in AI investment will not soon lead to diminishing returns, but on the contrary lead to even faster progress. The frequency of new breakthroughs is already overwhelming AI researchers. At OpenAI I often referenced papers my coworkers hadn’t read even though they were relevant to what they were working on. This overload means it will be harder for any individual researcher to have an accurate estimate of the pace of progress. Keep this in mind when you read about estimates of AI progress in the news. Spend time updating your own opinion with new data. Will it change your life like it did mine?

Further reading

To learn more about why powerful AI systems might be dangerous, why that’s an important problem to work on, and relevant career paths for both technical and non-technical people, I highly recommend 80,000 Hours Project’s overview of the subject. It’s thoroughly researched and considers aspects like how few people are already working on it and the potential for impact.

More by Ben Mann

Topics of interest

More Related Stories