Author, GANs in Action @ManningBooks 📖 • CompSci @Harvard • Data/ML Product Manager
It started in a pub. Ian Goodfellow and his fellow doctoral students at the University of Montreal met to celebrate the completion of the 2014 academic year. Their conversation focused on the frontier of AI research: synthetic data generation; specifically, approaches that would allow a computer to produce realistic-looking images. Compared to everything machines have mastered, this problem may not seem worthy of the attention of some of the finest minds in AI research. However, they recognized the profound implications of computer’s shortcomings that lay hidden beneath the mountain of all it could accomplish.
Machine learning algorithms have been great at recognizing patterns in existing data and using that insight for tasks such as classification and prediction. When asked to generate new data, computers have struggled. An algorithm can defeat a chess grandmaster or classify whether a credit card transaction is likely to be fraudulent; in contrast, any attempt at making small talk with Amazon’s Alexa or Apple’s Siri is doomed. Indeed, humanity’s most basic and essential capacities — e.g., a convivial conversation or the crafting of an original creation — can leave even the most sophisticated supercomputers in digital spasms.
Ian’s colleagues were considering complex statistical methods to help computers grasp the various elements that constitute an image — a step they considered a prerequisite for generating realistic-looking data.¹ In order to create something, they reasoned, a computer must understand it first. Ian eschewed the arduous approach of having a researcher improve an algorithm’s understanding of images and embarked on a novel path; he conceived of a way to direct another algorithm to do the teaching. In doing so, Ian leveraged what machines do well (recognize existing data) to overcome what they do poorly (produce new data).
Ian leveraged what machines do well (recognize existing data) to overcome what they do poorly (produce new data).
The resulting machine learning model, which Ian implemented after returning home from the pub, came to be known as GAN (Generative Adversarial Network). The word “generative” indicates the overall purpose of the model: creating new data. The data that a GAN will learn to generate depends on the choice of the training set — for example, if we want a GAN to paint like Leonardo da Vinci, we would use a training dataset of Leonardo’s artwork.
The term “adversarial” points to the game-like, competitive dynamic between the two algorithms that constitute the GAN framework: the Generator and the Discriminator. The Generator’s goal is to create examples that are indistinguishable from the real data in the training set. In our example, this means producing paintings that look just like Leonardo’s. The Discriminator’s objective is to tell the fake examples produced by the Generator from the real examples coming from the training dataset. In our example, the Discriminator plays the role of an art expert assessing the authenticity of paintings believed to be Leonardo’s. The two networks are constantly trying to outwit one another: the better the Generator gets at creating convincing data, the better the Discriminator needs to be at distinguishing real examples from the fake ones.
Lastly, the word “networks” indicates the class of machine learning models most commonly used to represent the Generator and the Discriminator: neural networks. As their name suggests, these models are loosely inspired by the human brain — analogous to the nervous system, they use a set of interconnected nodes, or “neurons,” to process their computations.
Although the mathematics underpinning GANs are fairly complex, there are many real-world analogies that may make the intuition behind them easier to understand. Above, we discussed the example of an art forger (the “Generator”) trying to fool an art expert (the “Discriminator”). The more convincing the fake paintings the forger makes, the better the art expert must be at determining their authenticity. This is true in the reverse situation as well: the better the art expert is at telling whether a particular painting is genuine, the more the forger must improve his or her craft to avoid being caught red-handed.
Another metaphor often used to describe GANs — one that Ian himself likes to use — is one of a criminal (the “Generator”) who forges money and a detective (the “Discriminator”) who tries to catch him. The more authentic-looking the counterfeit bills become, the better the police must be at detecting them, and vice versa.
Ever since Ian and his co-authors published a paper detailing his invention, GANs have been hailed by academics and industry experts as one of the most important innovations in deep learning. Yann LeCun, the Director of AI Research at Facebook, went as far as to say that GANs and their variations are “the coolest idea in deep learning in the last 20 years.”²
The excitement is well-justified. GANs have achieved remarkable results that have long been thought impossible for artificial systems, such as the ability to generate photorealistic images or turn a video footage of a horse into a running zebra — all without the need for vast troves of painstakingly-labeled training data. Unlike other advancements in machine learning that may be household names among researchers but would elicit no more than a quizzical look from anyone else, GANs have captured the imagination of researchers and the wider public alike. Indeed, they have been covered by The New York Times, The BBC, Scientific American, and many other prominent media outlets. As yet another testament to the technology’s allure, a portrait produced by a GAN was recently sold at Christie’s auction for over $400,000.³
Some of the spotlight focuses on the technology’s potential for mischief. At the end of an aptly titled piece about GANs — “How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos”⁴ — the New York Times journalists Cade Metz and Keith Collins discuss the worrying prospect of GANs’ being exploited to create and spread convincing misinformation, including fake video footage of statements by world leaders. Martin Giles, the San Francisco bureau chief of MIT Technology Review, echoes their concern and mentions another potential risk: in the hands of skilled hackers, GANs can be used to intuit and exploit system vulnerabilities at an unprecedented scale.⁵
Other applications of GANs are less ominous, even beneficial. The online giant Amazon is experimenting with harnessing GANs for fashion recommendations — by analyzing countless outfits, the system will learn to produce new items matching any given style.⁶ In medical research, GANs are used to augment datasets to improve diagnostic accuracy⁷ and even to aid new drug discovery.⁸ In game development, GANs can be leveraged to create new game levels and characters dynamically — without the need for human programmers and UX designers.⁹ GANs are also seen as an important stepping stone toward achieving so-called “artificial general intelligence,”¹⁰ an artificial system capable of matching human cognitive capacity to acquire expertise in virtually any domain — from motor skills involved in walking to language and creative skills needed to compose sonnets.
When future historians look back at the fateful day Ian went out drinking with his friends, it remains to be seen whether they would have wished he stayed home and the idea of two dueling neural networks had never occurred to him. Only the coming years will tell if the fears about GANs’ misuse will prove justified, or if any of the experimental applications will find its way to improving the lives of patients, optimizing creative workflows, or ushering in an era of sentient supercomputers. What is certain is that GANs have unlocked a vast array of research directions and applications whose impact will not be restricted to academia alone. Perhaps, it is only fitting that GANs were invented in a pub because we all may need a drink before this is over.
Create your free account to unlock your custom reading experience.