Introduction To The Convolution

In this article, we are going to learn about the grayscale image, colour image and the process of convolution.

Grayscale image

A grayscale image where the image is represented as only the shades of grey. The intensity of the various pixels of the image is denoted using the values from 0 to 255. i.e., from black to white in terms of an 8-bit integer. It uses only one channel.

Colour image

Coloured images are constructed by combining red, green and blue (RGB) colours in variable proportions. These 3 colours and hence they are called the primary colours. The colour image pixels contain three channels: The R channel, G channel and the B channel, each having its own intensity values ranging from 0 to 255.

What is convolution

Convolution is the process of multiplying each pixel with the corresponding pixel value of the filter and then adding all of the products to get the result. These combinations of result give the output image representation.

Now let us look at an example of convolution.

We pass a 6x6 input through a filter (Here we are using a vertical filter). We get a 4x4 output.

Now let us look at how each of the entries in the output is obtained.

We place the filter on top of the input starting from the top left corner till we reach the bottom right corner. Then we perform the process of convolution (multiply the corresponding entries and add them together). The obtained result is the corresponding output entry. Here we take stride value as 1. That is we jump 1 step to the right after each calculation. When we reach the column end, we jump 1 row below. This process goes on till we reach the bottom right corner.

The Convolution operation: The part of the input to be convolved with the filter in each step is highlighted.

The 1st output entry:

1(2) +1(0) +1(-1) +1(1) +1(0) +1(-2) +1(2) +1(0) +1(-1)

= 2 -1 +1 -2 +2 -1

= 1

The 2nd output entry:

1(2) +1(0) +0 +1(1) +1(0) +0 + 1(2) +1(0) +0

= 2 +0 +0 +1 +0 +0 +2 +0 +0

= 5

The 3rd output entry:

1(2) +1(1) +1(2)

= 2 +1 +2

= 5

The 4th output entry:

1(-1)+1(-2)+1(-1)

= -1 -2 -1

= -4

By performing similar calculations;

The 5th output entry = 1

The 6th output entry = 5

The 7th output entry = 5

The 8th output entry = -4

The 9th output entry = 1

The 10th output entry = 5

The 11th output entry = 5

The 12th output entry = -4

The 13th output entry = 1

The 14th output entry = 5

The 15th output entry = 5

The 16th output entry = -4

The output we obtained here is of the order 4 while we have given the input of order 6. Hence we can say that some information loss occurs here.

To prevent this loss of information, we use the padding technique.

Padding is the number of pixels that are added to an input image. Padding allows more space for the filter to cover the image and it also helps in improving the accuracy of image analysis.

Broadly classified, there are two types of padding. They are valid padding and same padding.

Valid Padding:

It implies no padding at all. That is input image is fed into the filter as it is. So if we consider the input of the order (n), a filter of order (f) and take stride=1, we get the output image of order (n-f+1).

We can notice here the order of output image decreases. Hence we can clearly state that some information is lost as we traverse from input to the output. The example provided above is only for one convolutional layer. But in deep neural networks, there is more than one convolutional layer. Hence this obtained output image when passed through the filter in further steps, will result in further shrinkage in size.

Same Padding:

In the case of the same padding, we add padding layers say 'p' to the input image in such a way that the output has the same number of pixels as the input. So in simple terms, we are adding pixels to the input, to get the same number of pixels at the output as the original input.

So if padding value is '0', the pixels added to be input will be '0'. If the padding value equals '1', pixel border of '1' unit will be added to the input image and so on for higher padding values.

So if we consider the input of the order (n), a filter of order (f) and take stride=1, we get the output image of order (n+2p-f+1), if the padding layers added is equal to 1. We can either add zeros to the padding layer or the adjacent entry. The more commonly used method is to add zeros to the padding layer as shown below.

Previously published at https://ashwinsharma.tech/knowing-convolution-basics