Some call it “strong” AI, others “real” AI, “true” AI or artificial “general” intelligence (AGI)… whatever the term (and important nuances), there are few questions of greater importance than whether we are collectively in the process of developing generalized AI that can truly think like a human — possibly even at a superhuman intelligence level, with unpredictable, uncontrollable consequences.
This has been a recurring theme of science fiction for many decades, but given the dramatic progress of AI over the last few years, the debate has been flaring anew with particular intensity, with an increasingly vocal stream of media and conversations warning us that AGI (of the nefarious kind) is coming, and much sooner than we’d think. Latest example: the new documentary Do you trust this computer?, which streamed last weekend for free courtesy of Elon Musk, and features a number of respected AI experts from both academia and industry. The documentary paints an alarming picture of artificial intelligence, a “new life form” on planet earth that is about to “wrap its tentacles” around us. There is also an accelerating flow of stories pointing to an ever scarier aspects of AI, with reports of alternate reality creation (fake celebrity face generator and deepfakes, with full video generation and speech synthesis being likely in the near future), the ever-so-spooky Boston Dynamics videos (latest one: robots cooperating to open a door) and reports about Google’s AI getting “highly aggressive”
However, as an investor who spends a lot of time in the “trenches” of AI, I have been experiencing a fair amount of cognitive dissonance on this topic. I interact daily with a number of AI entrepreneurs (both in my portfolio and outside), and the reality I see is quite different: it is still very difficult to build an AI product for the real world, even if you tackle one specific problem, hire great machine learning engineers and raise millions of dollars of venture capital. Evidently, even “narrow” AI in the wild is nowhere near working just yet in scenarios where it needs to perform accurately 100% of the time, as most tragically evidenced by self-driving related recent deaths.
So which one is it? The main characteristic of exponential technology accelerations is that they look like they’re in the distant future, until suddenly they’re not. Are we about to hit an inflection point?
A lot of my blog posts on AI have been about how to build AI applications and startups. In this post, I look a bit upstream at the world of AI research to try and understand who’s doing what work, and what may be coming down the pipe from the AI research labs. In particular, I was privileged to attend an incredible small-group workshop ahead of the Canonical Computation in Brains and Machines held at NYU a few weeks ago, which was particularly enlightening and informs some of the content in this post.
Those are just my notes, destined to anyone in tech and startups generally curious about AI, as opposed to a technical audience. Certainly a work in progress, and comments are most welcome.
Here is what I have learned so far.
More AI research, resources and compute than ever to figure out AGI
A lot has been written about the explosion of startup activity in AI, with a reported $15.2 billion of venture capital going to AI startups in 2017 (CB Insights), but the same has also been happening upstream in AI research.
The overall number of research papers published on AI has increased dramatically since 2012 — to the point of generating projects like Arxiv Sanity Preserver, a browser to access some 45,000+ papers, launched by Andrej Karpathy, “because things were seriously getting out of hand”
NIPS, a highly technical conference started in 1987, once a tiny and obscure event, had 8,000 participants in 2017 .
AI research is an increasingly global effort. In addition to the “usual suspect” US universities (e.g. MIT CSAIL lab), some of the most advanced AI research centers are located in Canada (particularly Toronto, with both University of Toronto and the new Vector Institute, and Montreal, including MILA), Europe (London, Paris, Berlin), Israel and, increasingly, China.
(Anecdotally, many in AI academia report increasingly meeting very impressive young researchers, including some teenagers, who are incredibly technically proficient and forward thinking in their research, presumably as a result of the democratization of AI tools and education).
The other major recent trend has been that fundamental AI research has been increasingly conducted in large Internet companies. The model of the company-sponsored lab, of course, is not new — think Bell Labs. But it’s taken a new dimension in AI research recently. Alphabet/Google have both DeepMind (a then startup acquired in 2014, now a 700-person group focused largely on fundamental AI research, run by Demis Hassabis) and Google Brain (started in 2011 by Jeff Dean, Greg Corrado and Andrew Ng, with more focus on applied AI). Facebook has FAIR, headed up by Yann LeCun, one of the fathers of deep learning. Microsoft has MSR AI. Uber has the Uber AI Labs, that came out of their acquisition of New York startup Geometric Intelligence. Alibaba has Alibaba A.I. Labs, Baidu has Baidu Research and Tencent has the Tencent AI Lab. The list goes on.
Those industry labs have deep resources and routinely pay millions to secure top researchers. One of the recurring themes in conversations with AI researchers is that, if it is hard for startups to attract students graduating with a PhD in machine learning, it’s even harder for academia to retain them.
Many of those labs and are pursuing, explicitly or implicitly, AGI.
In addition, AI research, particularly in those industry labs, has access to two key resources at unprecedented levels: data and computing power.
The ever-increasing amount of data available to train AI has been well documented by now, and indeed Internet giants like Google and Facebook have a big advantage when it comes to developing broad horizontal AI solutions. Things are also getting “interesting” in China where massive pools of data are being aggregated to train AI for face recognition, with unicorn startups like Megvii (also known as Face++) and SenseTime as beneficiaries. In 2017, a plan called Xue Liang (“sharp eyes”) was announced and involved pooling and processing centrally footage from surveillance cameras (both public and private) across over 50 Chinese cities. There are also rumors of aggregation of data across the various Chinese Internet giants for purposes of AI training.
Beyond data, another big shift that could precipitate AGI is a massive acceleration in computing power, particularly over the last couple of years. This is a result of progress both in terms of leveraging existing hardware, and building new high performance hardware specifically for AI, resulting in progress at a faster pace than Moore’s law.
To rewind a bit, the team that won the ImageNet competition in 2012 (the event that triggered much of the current wave of enthusiasm around AI) used 2 GPUs to train their network model. This took 5 to 6 days, and was considered the state of the art. In 2017, Facebook announced that it had been able to train ImageNet in one hour, using 256 GPUs. And a mere months after it did, a Japanese team from Preferred Networks broke that record, training ImageNet in 15 minutes with 1024 NVIDIA Tesla P100 GPUs.
But this could be a mere warm-up, as the world is now engaged in a race to produce ever more powerful AI chips and the hardware that surrounds them. In 2017, Google released the second generation of its Tensor Processing Units (TPUs), which are designed specifically to speed up machine learning tasks. Each TPU can deliver 180 teraflops of performance (and be used for both inference and training of machine learning models). Those TPUs can be clustered to produce super-computers — a 1,000 cloud TPU system is available to AI researchers willing to openly share their work.
There is also tremendous activity at the startup level, with heavily-funded emerging hardware players like Cerebras, Graphcore, Wave Computing, Mythic and Lambda, as well as Chinese startups Horizon Robotics, Cambricon and DeePhi.
Finally, there’s emerging hardware innovation around quantum computing and optical computing. While still very early from a research standpoint, both Google and IBM announced some meaningful progress in their quantum computing efforts, which would take AI to yet another level of exponential acceleration.
The massive increase in computing power opens the door to training the AI with ever increasing amounts of data. It also enables AI researchers to run experiments much faster, accelerating progress and enabling the creation of new algorithms.
One of the key point that folks at OpenAI (Elon Musk’s nonprofit research lab) make is that AI already surprised us with its power when the algorithm were running on comparatively modest hardware a mere five years ago — who knows what will happen with all this computing power? (see this excellent TWiML & AI podcast with Greg Brockman, CTO of OpenAI)
AI algorithms, old and new
The astounding resurrection of AI that effectively started around the 2012 ImageNet competition has very much been propelled by deep learning. This statistical technique, pioneered and perfected by several AI researchers including Geoff Hinton, Yann LeCun and Yoshua Bengio, involves multiple layers of processing that gradually refine results (see this 2015 Nature article for an in depth explanation). It is an old technique that dates back to the 1960s, 1970s and 1980s, but it suddenly showed its power when fed enough data and computing power.
Deep learning powers just about any exciting AI product from Alexa to uses of AI in radiology to the “hot dog or not” spoof product from HBO’s Silicon Valley. It has proven remarkably effective at pattern recognition across a variety of problems — speech recognition, image classification, object recognition and some language problems.
From an AGI perspective, deep learning has stirred imaginations because it does more than what it was programmed to do, for example grouping images or words (like “New York” and “USA”) around ideas, without having been explicitly told there was a connection between such images or words (like “New York is located in the USA”). AI researchers themselves don’t always know exactly why deep learning does what it does.
Interestingly, however, as the rest of the world is starting to widely embrace deep learning across a number of consumer and enterprise applications, the AI research world is asking whether it is hitting diminishing returns. Geoff Hinton himself at a conference in September 2017 questioned back-propagation, the backbone of neural networks which he helped invent, and suggested starting over, which sent shockwaves in the AI research world. A January 2018 paper by Gary Marcus presented ten concerns for deep learning and suggested that “deep learning must be supplemented by other techniques if we are to reach artificial general intelligence”.
Much of the discussion seems to have focused on “supervised” learning — the form of learning that requires being shown large amounts of labeled examples to train the machine on how to recognize similar patterns.
The AI research community now seems to agree that, if we are to reach AGI, efforts need to focus more on unsupervised learning — the form of learning where the machine gets trained without labeled data. There are many variations of unsupervised learning, including autoencoders, deep belief networks and GANs.
GANs, or “generative adversarial networks” is a much more recent method, directly related to unsupervised deep learning, pioneered by Ian Goodfellow in 2014, then a PhD student at University of Montreal. GANs work by creating a rivalry between two neural nets, trained on the same data. One network (the generator) creates outputs (like photos) that are as realistic as possible; the other network (the discriminator) compares the photos against the data set it was trained on and tries to determine whether whether each photo is real or fake; the first network then adjusts its parameters for creating new images, and so and so forth. GANs have had their own evolution, with multiple versions of GAN appearing just in 2017 (WGAN, BEGAN, CycleGan, Progressive GAN).
This last approach of progressively training GANs enabled Nvidia to generate high resolution facial photos of fake celebrities.
Another related area that has seen considerable acceleration is reinforcement learning — a technique where the AI teaches itself how to do something by trying again and again, separating good moves (that lead to rewards) from bad ones, and altering its approach each time, until it masters the skill. Reinforcement learning is another technique that goes back as far as the 1950s, and was considered for a long time an interesting idea that didn’t work well. However, that all changed in late 2013 when DeepMind, then an independent startup, taught an AI to play 22 Atari 2600 games, including Space Invaders, at a superhuman level. In 2016, its AlphaGo, an AI trained with reinforcement learning, beat the South Korean Go master Lee Sedol. Then just a few months ago in December 2017, AlphaZero, a more generalized and powerful version of AlphaGo used the same approach to master not just Go, but also chess and shogi. Without any human guidance other than the game rules, AlphaZero taught itself how to play chess at a master level in only four hours. Within 24 hours, AlphaZero was able to defeat all state of the art AI programs in those 3 games (Stockfish, elmo and the 3-day version of AlphaGo).
How close is AlphaZero from AGI? Demis Hassabis, the CEO of DeepMind, called AlphaZero’s playstyle “alien”, because it would sometime win with completely counterintuitive moves like sacrifices. Seeing a computer program teach itself the most complex human games to a world-class level in a mere few hours is an unnerving experience that would appear close to a form of intelligence. One key counter-argument in the AI community is that AlphaZero is an impressive exercise in brute force: AlphaZero was trained via self-play using 5,000 first generation TPUs and 64 second generation TPUs; once trained it ran on a single machine with 4 TPUs. In reinforcement learning, AI researchers point out that the AI has no idea what it is actually doing (like playing a game) and is limited to the specific constraints that it was given (the rules of the game). Here is an interesting blog post disputing whether AlphaZero is a true scientific breakthrough.
When it comes to AGI, or even the success of machine learning in general, several researchers have high hopes for transfer learning. Demis Hassabis of DeepMind, for example, calls transfer learning “the key to general intelligence”. Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task. The idea is that with this precedent knowledge learned from the first task, the AI will perform better, train faster and require less labeled data than a new neural network trained from scratch on the second related task. Fundamentally, the hope it that it can help AI be more “general” and hop from task to task and domain to domain, particularly those where labeled data is less readily available (see a good overview here)
For transfer learning to lead to AGI, the AI would need to be able to do transfer learning across increasingly far apart tasks and domains, which would require increasing abstraction. According to Hassabis “the key to doing transfer learning will be the acquisition of conceptual knowledge that is abstracted away from perceptual details of where you learned it from”. We’re not quite there as of yet. Transfer learning has been mostly challenging to make work — it works well when the tasks are closely related, but becomes much more complex beyond that. But this is a key area of focus for AI research. DeepMind made significant progress with its PathNet project (see a good overview here), a network of neural networks. As another example of interest from the field, just a few days ago, OpenAI launched a transfer learning contest that measures a reinforcement learning algorithm’s ability to generalize from previous experience. The algorithms will be tested against 30 SEGA “old school” video games.
Recursive Cortical Networks (RCN) are yet another promising approach. Developed by Silicon Valley startup Vicarious, RCN were recently used to solve text-based CAPTCHAs with a high accuracy rate using significantly less data than its counterparts much — 300x less in the case of a scene text recognition benchmark, (see Science article, December 8, 2017)
There are many more methods being contemplated, developed or re-explored in light of the most recent technological progress, including in no particular order: Geoff Hinton’s capsule networks or CapNets (approachable explanation involving Kim Kardashian here), neural attention models (approachable explanation without Kim Kardashian here), one shot learning, differentiable neural computers (DNC), neuroevolution, evolutionary strategies,… the list goes on, as further testament to the explosive vitality of AI research.
The fusion of AI and neuroscience
All the techniques described so far are essentially mathematical and statistical in nature and rely on a lot of computing power and/or data to reach success. While considerable prowess has been displayed in creating and improving such algorithms, a common criticism against those methods is that machines are still not able to start from, or learn, principles. AlphaZero doesn’t know it is playing a game, or what a game is, for that matter.
A growing line of thinking in research is to rethink core principles of AI in light of how the human brain works, including in children. While originally inspired by the human brain (hence the term “neural”), neural networks separated pretty quickly from biology — a common example is that back propagation doesn’t have an equivalent in nature.
Teaching a machine of how to learn like a child is one of the oldest ideas of AI, going back to Turing and Minsky in the 1950s, but progress is being made as both the field of artificial intelligence and the field of neuroscience are maturing.
This intersection of AI and neuroscience was very much the theme of the “Canonical Computation in Brains and Machines” workshop I alluded to earlier. While both fields are still getting to know each other, it was clear that some of the deepest AI thinkers are increasingly focused on neuroscience inspired research, including deep learning godfathers Yann LeCun (video: What are the principles of learning in newborns?) and Yoshua Bengio (video: Bridging the gap between deep learning and neuroscience).
A particularly promising line of research comes from Josh Tenenbaum, a professor of Cognitive Science and Computation at MIT. A key part of Tenenbaum’s work has been to focus on building quantitative models of how an infant or child learns (including in her sleep!), as opposed to what she inherits from evolution, in particular what he calls “intuitive physics” and “intuitive psychology”. His work has been propelled by progress in probabilistic languages (part of the Bayesian world) that incorporate a variety of methods such as symbolic languages for knowledge representation, probabilistic inference for reasoning under uncertainty and neural networks for pattern recognition. (Videos: “Building machines that learn and think like people” and “Building machines that see, learn, and think like people”)
While MIT just launched in February an initiative called MIT Intelligence Quest to help “crack the code of intelligence” with a combination of neuroscience, cognitive science, and computer science, all of this is still very much lab research and will most likely require significant patience to produce results applicable to the real world and industry.
So, how far are we from AGI? This high level tour shows contradictory trends. On the one hand, the pace of innovation is dizzying — many of the developments and stories mentioned in this piece (AlphaZero, new versions of GANs, capsule networks, RCNs breaking CAPTCHA, Google’s 2nd generation of TPUs, etc.) occurred just in the last 12 months, in fact mostly in the last 6 months. On the other hand, many the AI research community itself, while actively pursuing AGI, go to great lengths to emphasize how far we still are — perhaps out of concern that the media hype around AI may lead to dashed hopes and yet another AI nuclear winter.
Regardless of whether we get to AGI in the near term or not, it is clear that AI is getting vastly more powerful, and will get even more so as it runs on ever more powerful computers, which raises legitimate concerns about what would happen if its power was left in the wrong hands (whether human or artificial). One chilling point that Elon Musk was making the “Do you trust this computer?” documentary was that AI didn’t even need to want to be hostile to humans, or even know what humans are, for that matter. In its relentless quest to complete a task by all means, it could be harmful to humans just because they happened to be in the way, like a roadkill.
Leaving aside physical harm, progress in AI leads to a whole series of more immediate dangers that need to be thoroughly thought through — from significant job losses across large industries (back offices, trucking) to a complete distortion of our sense of reality (when fake videos and audio can be easily created).
Photo / chart credits: GOOGLE/CONNIE ZHOU A row of servers in Google’s data center (with a cooling off system powered by Google Brain) in Douglas County, Ga. Arxiv-sanity chart found on this blog post.