Identifying motorable areas using FCN8 Segmentation is essential for image analysis tasks. describes the process of associating each pixel of an image with a class label, (such as , , , , , or ). Semantic segmentation flower person road sky ocean car image source: Mathworks There are various sectors which find a lot of potential in semantic segmentation approaches. Which obviously include autonomous driving, industrial inspection of boilers, thermals charts etc., classification of terrain visible in satellite imagery, medical imaging analysis. Just for personal interest I also studied about detection of diseases in plants by their leaves. This also includes segmentation to separate the veins or blade from the actual disease markings. This also makes the processing and detection of the disease easier and more accurate. But what is semantic segmentation actually ? Semantic segmentation is understanding an image at pixel level i.e, we want to assign each pixel in the image an object class. For example, check out the following images. Input Image Source semantic segmentation. Source In the above image there are only three classes, Human, Bike and everything else. FCN can be trained to detect road, plants and sky as well. and are the most important datasets for semantic segmentation. VOC2012 MSCOCO In 2014, by Long et al. from Berkeley, popularized CNN architectures for dense predictions without any fully connected layers. This allowed segmentation maps to be generated for image of any size and was also much faster compared to the patch classification approach which was used earlier. Almost all the subsequent state of the art approaches on semantic segmentation adopted this paradigm. Fully Convolutional Networks (FCN) Apart from fully connected layers, one of the main problems with using CNNs for segmentation is . Pooling layers increase the field of view and are able to aggregate the context while discarding the ‘where’ information. However, semantic segmentation requires the exact alignment of class maps and thus, needs the spatial information to be preserved. Two different classes of architectures evolved in the literature to tackle this issue. pooling layers First one is encoder-decoder architecture. Encoder gradually reduces the spatial dimension with pooling layers and decoder gradually recovers the object details and spatial dimension. There are usually shortcut connections from encoder to decoder to help decoder recover the object details better. The second approach is not discussed here. While going through padding differences in transposed convolution, I learnt something really interesting about SAME and VALID padding. The most important thing to understand here is that the filter kernel doesn’t goes out of the input image dimensions in Valid padding, and this is true for both convolution and transposed convolution. Similarly in Same padding kernel can go out of the image dimension. Talking more about As you increase the stride of the kernel, Input image is padded between the pixels. If the stride is 2, there will be one row and column padded between each existing row and column. If stride is 1 there won’t be any padding. Valid padding. Stride:1, kernel:3x3, source Stride:2, kernel:3x3, source Keeping the k same and increasing stride decreases overlapping.This overlapping refers to the common area calculated by the adjacent kernel actions. Let’s also visualize the opposite effect. Stride:2, kernel:4x4, source Thus the padded input image depends upon the stride as Ip_d= (I-1)*s where s=stride, I= Input dimension, and Ip_d is padded input dimension. And the output Image dimension depends upon padded input image dimesion and kernel size as below: O_d= Ip_d+ k; O_d= (I-1)*s+k; where k is kernel size. This equation holds true whether kernel size is greater or smaller than the stride and can be verified . However, my colleague has derived a while playing with some code on tensorflow which says. here Keshav Aggarwal better equation O_d = I * s+ max(k — s, 0); where all variables are same as above. I suggest playing around with the code a bit. is simpler but rather mysterious. Same padding always pads the empty rows and columns on the outside of the image. In normal convolution even if padding is same and the kernel can sweep the complete image properly with the mentioned stride, no padding is actually done on the input image. However if some rows or columns are left due to the kernel size and stride value, some extra columns and rows are added to cover the whole image. Same Padding This is not the case in transposed convolution. Output image dimension is not dependent on kernel size of the filter but increases by the number of times of mentioned stride. O_d= I_d*s; where s=stride, I_d= Input dimension, and O_d is padded input dimension. Output dimension is calculated by the system beforehand in this case and then the image is padded on the outside accordingly before applying the filter to maintain the output dimension, the same as calculated, after the deconvolution. Priority is given to the addition of columns, equally on both sides of the image. However if they can’t be added equally, the remaining extra column is added to the right side. So how can I up-sample an image using both of these filter? It’s simple as now we have the equations. Suppose we want to upscale an image to two times of the original. For you can set the kernel to any suitable value and stride to 2. Same padding For you can set both the kernel and stride value to 2. Valid padding However the performance of these filters is an area of experimentation. I found Same padding to be working better than the Valid padding here. Setting kernel size to an even number is not a good practice, but if you want to upscale by a factor of 2 using valid padding, there seems to be no other way. You can visit some of the projects on my profile and for more articles by me visit my medium account or . Github Wordpress

Semantic Segmentation and Transposed Convolution.

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps