Generative AI Model: GANs (Part 1)

Welcome to my series of blogs on a very interesting but complex topic “Generative AI”. There are numerous of components and specifications present in the topic, which makes it very difficult to grasp at once. Therefore, this series will explain exactly one topic every blog.

Let’s Start! This blog will be taking about the model components of Generative Neural Networks (GANs) which are a basic but fascinating area of the Generative AI domain. It is crucial to understand it as the first and foremost step in the world of Generative AI. GAN is an unsupervised learning technique where training data full of images is added as an input with no labels. Let’s deep dive into the model Architecture.

What is Generative AI?

It is a subset of Artificial Intelligence that can generate new data based on the input data. It is able to identify patterns and structure in the data thereby able to generate fake data based on it which looks realistic. There are various applications of the Generative AI:

Image and Video Generation
Text Generation
Data Augmentation
Speech and audio synthesis

Model Architecture

GANs model is composed of two components: Generator and Discriminator. These two components compete with each other to optimise the corresponding parameters. The model is provided with some real images, however to make discriminator better and better, generator creates certain fake images as well. In the below example, Fake label is represented by 0 and Real label is represented by 1.

Generator:

Job of the generator is to take random noise and create fake images out of it. Then these fake images are transferred to the Discriminator and Discriminator releases the probability of it being a real one. Then the difference between real and fake image is utilised to train the generator model so that it can deceive the discriminator.

Step by step process description:

Generate Random Noise: To generate the fake images a random input is passed to the generator. This input is referred as the noise. The noise is used for as the raw data in creating the fake images.
Creation of Fake Images: In the process of creating these fake images, it uses multiple layers of neural networks. The noise is transformed step-by-step into a recognisable image.
Send Fake images to the Discriminator: Once the images are generated they are passed to the Discriminator and its job is to identify if the passed image is real (output = 1) or a fake (output = 0).
Feedback sent back to the Generator: Just like any other neural network, the feedback from the Discriminator is sent back to the generator to adjust image generation to make it more and more difficult for Discriminator to identify.

Discriminator:

For discriminative training, both fake and real images are supplied to the discriminator model. Then the discriminator model tries to identify if its a fake image or real one. The corresponding loss is used for updating the parameters of the discriminator.

Step by step process description:

Input: The Discriminator receives the both real (from the training data set) and fake (created by generator) images. This mixed batch is used for training.
Image Identification: The model uses dense neural network to identify if an image is real or fake.
Calculate Loss based on predictions: Loss is lower if Discriminator is able to identify the fake and real images correctly. However, it keeps on increasing as the number of wrong identification increases.
Parameter update: The above calculated loss is back-propagated and used for updating the Discriminator model parameters. This is done by using various loss optimisation algorithms like gradient descent, etc.

The Adversarial Process

The back and forth process between Generator and Discriminator is a very powerful concept, that makes GANs unique and effective. The concept is derived from Game Theory and is depicted as a zero-sum game implying one model’s loss is other model’s gain. In the process, both models improve leading to highly realistic fake images and a highly accurate discriminator.

But how is equilibrium obtained in this war? OR How do we identify that the model is well trained?

The answer is simple. As the original task of the GANs is to generate realistic images. Therefore, It is attained in the situation when the generator is capable of generating images such that Discriminator is not able to identify between real and fake more than a random guess.

This section ends here, another one will be coming soon!