Deep Learning has been on the rise for some time. Recently people have started using Deep Learning in many fields. If you directly want to dive in to the code then go to the 2nd Part.
In this series, you will learn to solve a simple problem of detecting a single object (like cat or dog) in an image. In the course of this solution, you will learn about one type of Deep Learning. You will also be able to code in Keras and Tensorflow, two of the famous libraries in this technology. I am not going to talk about the maths behind Deep Learning. The series has two parts. The first part talks about Deep Learning in basics and the gotchas. In the second part of the series, we will be looking at how to create your own models in Keras.
Before we begin, I’ll introduce myself. I am a Computer Science Engg, currently working @ Practo. Earlier I have worked on games on the Facebook platform (when it used to be a thing) and later on mobile games.
So what is Deep Learning? Why is it called Deep? Is the system actually learning?
Let’s start with a bit of history. Deep Learning is the latest cool word for Neural Networks and they have been around from the 60’s. If you don’t know what is a Neural Network, then don’t bother, I’ll explain in later part of this article. Around 2006 a brilliant guy called Geoffrey Hinton along with others came up with a paper. That paper had an interesting implementation of one type of Neural Network. In 2012 two of Hinton’s students won a competition (ILSVRC) by twice the margin from it’s nearest competitors. This showed the entire world that Hinton’s work can solve very interesting problems.
We are trying to solve Image Classification as a problem. By classification what we are trying to do is take an image and try to understand what is the content in that image. The current scope limits the solution to work on images which only have one type of object. Either the image will be a cat or a dog. For simplicity’s sake, we are currently not going to classify images which have a dog sitting in a car.
In a Neural Network, there are n-number of neurons and they interconnect with each other in a linear way. An input image passes from the input end and the network decides the class as an output. Training of a network means passing a lot of images of various classes as inputs. Each of these images is already tagged to one of the classes.
Basic figure depicting cross section of a Convolutional Neural Network
Neural Network is a simple mathematical formula which looks something like this:
x * w = y
Assume x is your input image and y is some output which is the network defined class. x is constant because there is only a fixed set of images. Network gives y as the output. We can only change w. We call the w as the weight of a single neuron layer. The process of training consists of two parts, forward pass and backpropagation. In forward pass we give images to the network as input (x) and the network generates some y’ output class. How close y’ is to y is the error of the network. In backpropagation, the network tries to diminish the error by tweaking the weight w. A lot of lingo calls w as hyper parameter, kernel, filter. The problem with neural networks is that all the layers pass the entire data from one layer to the other layer. To solve this we are going to use Convolutional Neural Networks. So what is convolution? Let’s see that below.
Convolution Layer
Neural Networks are fully connected, which means that one neuron layer would pass the entire dataset to the next layer. The next layer would process the entire data and so on and so forth. This works for simple images like 8x8 or even a 36x36 images. But practical images are 1024x768 in size then it becomes a huge computational task. Images are generally stationary in nature. That means the statistics of one part of the image is same as any other part. So a feature learnt in one zone can do similar pattern matching in another zone. In a big image, we take a small section and pass it through all the points in the big image. While passing at any point we convolve or join them into a single position. Instead, try to imagine that a big box of data becomes a small box of data for the next layer of neuron. This helps faster computation without loss of precision of data. Each small section of the image that passes over the big image converts into a small filter. The filters are later configured based on the back propagation data (we will come to that in a bit).
Next up is pooling. Pooling is nothing other than down sampling of an image. It again helps the processor to process things faster. There are many pooling techniques. One is max pooling where we take largest of the pixel values of a segment. Mean pooling, Avg pooling are also done. Instead of the largest pixel, we calculate mean and avg. Pooling makes the network invariant to translations in shape, size and scale. Max pooling is generally predominant.
A simple example of Max Pooling, where we are taking the largest pixel value in each coloured square.
A single neuron behaves as a linear classifier. A neuron has the capacity to switch on or switch off based on certain consecutive sections of input data. We call this property of a neuron, activation. Activation functions are mathematical functions which behave very much like valves. Assume there is a valve which opens when there is a good amount of pressure like a pressure cooker. Data which makes an activation function turn true marks the neuron as active. We classify an image based on which all neurons in the network got activated. There are many activation functions, but ReLu is the most famous of them. Why you choose ReLu is out of the scope of this document. I will soon write another article which talks about different Activations functions.
Back propagation is the process in which we try to bring the error down. By error, I mean the difference in y and y’. This will help w, to fit the data set that we gave to the network. We perform Back propagation using Gradient descent process. This process tries to bring the error value close to zero.
Above literature is pretty much enough for starting to work on applied #CNNs. As and when you will get stuck in the implementation phase, you can read more about that particular topic. Leave back questions in the comments section and I will address them. This brings us to the end of this part of the series. Second part of this article is finished and you can find the link below.
You can find the second part of the series at this link. Do follow me on twitter and you can also signup for a small and infrequent mailing list that I maintain. If you liked this article, please hit the ❤ button to recommend it. This will help other Medium users find it.