Generative Adversarial Networks — A Deep Learning Architecture

Generative Adversarial Networks (GANs)Generative Adversarial Nets, or GAN, in short, are neural nets which were first introduced by Ian Goodfellow in 2014. The algorithm has been hailed as an important milestone in Deep learning by many AI pioneers. Yann Le Cunn (father of convolutional neural networks) told that GANs is the coolest thing that has happened in deep learning within the last 20 years. Many versions of GAN have since come up like DCGAN, Sequence-GAN, LSTM-GAN, etc.

GANs are neural networks composed up of two networks competing with each other. The two networks namely generator — to generate data set and discriminator — to validate the data set. The goal is to generate data points that are similar to some of the data points in the training set. The following shows a picture where GANs were able to generate images based on the text caption. The algorithm was given a text “A white bird with a black crown and yellow beak”. And the GAN was able to generate the image by itself based on the text given.

GANs have also been used in predicting future frames in a video

Apart from generating images GANs are also used to perform various abstract operations like removing of glasses from the image of a face, adding of glasses etc as shown below.

The above figure which demonstrates that GANs can learn a representation that separates the concept of gender from that of wearing glasses. If we begin with the representation of the concept of a man with glasses, then subtract the vector representing the concept of a man without glasses, and finally add the vector representing the concept of a woman without glasses, we obtain the vector representing the concept of a woman with glasses. The generative model correctly decodes all of these representation vectors to images.

Let’s take a simple example to relate how GANs work. Consider the scenario of what happened during demonetization in India, between a money counterfeiting criminal who has fake notes and the banks. What’s the objective of the criminal and what’s the objective of the banks in terms of counterfeited money? Let’s enumerate:

To be a successful money counterfeiter, the criminal wants to fool the IT department, so that the IT officials can’t tell the difference between counterfeited money and real money
To be successful the bank officials want to detect counterfeited money as soon as possible

Here, we see we have a clash of interest. This kind of situation could be modeled as a minimax game in Game Theory. And this process is called Adversarial Process.

What are Generative Adversarial Nets?

GANs is a special case of Adversarial Process where the components (the IT officials and the criminal) are neural nets. The first net generates data and the second net tries to tell the difference between the real and the fake data generated by the first net. The second net will output a scalar [0, 1] which represents the probability of real data.

The basic idea of generative modeling is to take a collection of training examples and form some representation that explains where this example came from. Generative adversarial networks (GAN) is something where samples are generated rather than finding a function. There are two basic things that can be done with the generative model. One is to take a collection of points and infer a function that describes the distribution that generated them. The second way is to build a generative model which is to take a machine that observes many samples from a distribution and is able to create more samples from the same distribution.

Architecture

The generator will try to generate fake images that fool the discriminator into thinking that they’re real. And discriminator will try to distinguish between a real and a generated image. They both get stronger together until the discriminator cannot distinguish between the real and the generated images anymore. After this, the GANs will be able to produce realistic images.

Zhu and others (2016) developed an interactive application called interactive generative adversarial networks (iGAN). A user can draw a rough sketch of an image, and iGAN tries to produce the most similar realistic image. In this example, the user has scribbled a few green lines that iGAN has converted into a grassy field, and the user has drawn a black triangle that iGAN has turned into a detailed mountain. Applications that create art are one of many reasons to study generative models that create images. A video demonstration of iGAN is available in the following video.

Training Procedure

The training process consists of sampling data from the training set after which we run the discriminator on those inputs. The discriminator is any kind of differentiable function that has parameters that we can learn with gradient descent. So we usually represent it as a deep neural network but in principle, it could be other kinds of models.

When the discriminator is applied to images that come from the training set, its goal is to output a value that is near 1. Representing a high probability that the input was real rather that fake. But half the time we also apply the discriminator to examples that are in fact fake. In this case, we begin by sampling the vector z from the prior distribution. So z is essentially a vector of unstructured noise. It’s a source of randomness that allows the generator to output a wide variety of different vectors. We then apply the generator function to the input vector z. The generator function is a differentiable function that has parameters that can be learned by the gradient descent similar to the discriminatory function.

Usually, the generator will be represented as a deep neural network. After G is applied to z, we obtain a sample from the model. And ideally, this will resemble actual samples from the data set similar to that showed in the above examples. After the sample is obtained the discriminator function D is applied again. This time the goal of the discriminator D is to output a value D of G of z that is near to value 1. The discriminator wants to make value zero and the generator would like to make it to near 1. The discriminator would like to reject these samples as being fake, while the generator would like to fool the discriminator into thinking that they are real.

In simple words, the generator must ask for suggestions from the discriminator and intuitively, the discriminator tells how much to tweak each pixel in order to make the image a little bit more realistic.

Conclusion

In conclusion, GANs are generative models that use supervised learning to approximate an intractable cost function ( a function that minimizes the error). Also, it can simulate many cost functions including the one used for maximum likelihood. GANs are key ingredients to various algorithms which are able to generate compelling high-resolution samples from diverse image classes. Although there are algorithms that are like GANs like variational autoencoder, WaveNet etc, they have various disadvantages. In variational encoders the samples generated are poor in quality. In WaveNet even though the generated data quality is good, the time taken to synthesize is very high. For example to generate 1 second of audio signal the Variational auto-encodes takes 1 minute to synthesize the signal. So GANs have a serious advantage as compared to other algorithms. It’s an amazing algorithm that is being used not only in images but also in cyber security and others.

References

If you’d like to follow my work on Deep Learning, AI, and Reinforcement Learning, follow me on Medium Gautam Ramachandra, or on Twitter gautam1858.