So, it’s 2018 and Nvidia Researchers are at it again. This time, with a revolutionary image in-painting and essentially, hole filling and quality enhancing algorithm. Comparison to other techniques This research paper (arXiv:1804.07723v1 [cs.CV]) came out on 20 April 2018. It focuses it’s point on the fact that recent in-painting approaches which do no use deep , use image statistics of the remaining image to fill in the hole. learning , one of the state-of-the-art methods, iteratively searches for the best fitting patches to fill in the holes. While this approach generally produces smooth results, it is limited by the available image statistics and has no concept of visual semantics. Patch Match PatchMatch was able to smoothly fill in the missing components of the painting using image patches from the surrounding shadow and wall, but a semantically-aware approach would make use of patches from the painting instead. Deep neural networks learn semantic priors and meaningful hidden representations in an end-to-end fashion, which have been used for recent image in-painting efforts. These employ convolutional filters on images, replacing the removed content with a fixed value. networks Other techniques like uses fast marching and Poisson image blending , while . employ a following-up refinement network to refine their raw network predictions. Another limitation of many recent approaches is the focus on rectangular shaped holes, often assumed to be center in the image. We find these limitations may lead to over-fitting to the rectangular holes, and ultimately limit the utility of these models in application. Iizuka et al . Yu et al In order to focus on the more practical irregular hole use case, we collect a large benchmark of images with irregular masks of varying sizes. In our analysis, we look at the effects of not just the size of the hole, but also whether the holes are in contact with the image border. What’s different in this model? The researchers have proposed the following modifications to the standard U-Net like structures. Use step for achieving state-of-the-art on image in-painting. partial convolutions with an automatic mask update While previous works fail to achieve good in-painting results with skip links in a U-Net with typical convolutions, they demonstrate that substituting convolutional layers with partial convolutions and mask updates can achieve state-of-the-art in-painting results. They have proposed a large irregular mask dataset. The researches gave a rather lucid reason for the mask update . Allow me to mention it here as it is: To properly handle irregular masks, we propose the use of a Partial Convolutional Layer, comprising a masked and re-normalized convolution operation followed by a mask-update step. The concept of a masked and re-normalized convolution is also referred to as segmentation-aware convolutions in for the image segmentation task, however they did not make modifications to the input mask. Our use of partial convolutions is such that given a binary mask our convolutional results depend only on the non-hole regions at every layer. . Given sufficient layers of successive updates, even the largest masked holes will eventually shrink away, leaving only valid responses in the feature map. The partial convolutional layer ultimately makes our model agnostic to placeholder hole values. Our main extension is the automatic mask update step, which removes any masking where the partial convolution was able to operate on an unmasked value The Model Approach and Architecture The proposed model uses stacked partial convolutional operations and mask updating steps to perform image in-painting. Let’s start with defining convolution and mask-update mechanism. For brevity, we refer to our partial convolution operation and mask update function jointly as the . Partial Convolutional Layer Let be the convolution filter weights for the convolution filter and its the corresponding bias. are the feature values (pixels values) for the current convolution (sliding) window and is the corresponding binary mask. The partial convolution at every location, similarly defined in , is expressed as: W b X M Partial Convolution mechanism After each partial convolution operation, we then . Our unmasking rule is simple: if the convolution was able to condition its output on at least one valid input value, then we remove the mask for that location. This is expressed as: update our mask Mask Update Scheme and can easily be implemented in any deep learning framework as part of the forward pass. With sufficient successive applications of the partial convolution layer, any mask will eventually be all ones, if the input contained any valid pixels. Network Design The network design is largely based on UNet like architectures using just one minor tuning, which is replacing all convolutional layers with partial convolutional ones. The network architecture Elaborating about the network architecture, it is important to mention that is the encoding network and the following ones having UpSampling skip links is the decoding architecture of the same. PConv 1 to PConv 8 The column indicates whether is followed by a Batch Normalization layer. The Non-linearity column shows whether and what non-linearity layer is used (following the BatchNorm if BatchNorm is used). BatchNorm PConv Loss Functions From the excerpts of the research paper: Our loss functions target both per-pixel reconstruction accuracy as well as composition, i.e. how smoothly the predicted hole values transition into their surrounding context. Given input image with hole I_in, initial binary mask M (0 for holes)the network prediction I_out, and the ground truth image I_gt, we first define our per pixel losses . These are the L1 losses on the network output for the hole and the non-hole pixels respectively. L_hole = k(1−M)⊙(I_out −I_gt)k1 and L_valid = kM ⊙(I_out −I_gt)k1 has been calculated using : Perpetual Loss where is the activation map of the nth selected layer. Ψn Perpetual Loss While the style losses has been taken into considerations and used as : Style Losses Our final loss term is the total variation loss : which is the smoothing penalty on P, where P is the region of 1-pixel dilation of the hole region. (TV) L_tv Smoothing Penalty So, the (after coefficient hyper parameter tuning) comes out to be: Total loss Total Loss Testing and Results in Hole Filling Comparisons among PConv One can easily have a look at the researchers result (PConv) and see how it fares against other models using 256*256 pixel dimensions. Important thing to note is that , whereas . All fine-tuning is performed in one day. ImageNet and Places2 models train for 10 days CelebA-HQ trains in 3 days So, you can see the amount of time it takes to train such models even after using with a batch size of 6! NVIDIA V100 GPU (16GB) Model Benchmarks I would like to acknowledge the time sensitivity while carrying out the precision results as a personal opinion. Considering that, this model is really convincing as a perfect alternative in both scores and . PConv L1 IScores The researchers have claimed that: Our method outperforms the other methods in most cases across different time periods and hole-to-image area ratios. Graphical Benchmarks and looking at the graphical benchmarks, I would not really argue. Other Uses Image super resolution task Resolution Enhance Task Results Yes, this algorithm can be used to enhance image resolution too. Let the following image tell it’s own story. I will leave this one as mystery 😉 Mask Updates with one-one mappings Limitations I would like to cite the research paper to state that the model, in itself won’t be a victim of catastrophic performance degradation as holes increase in size, but it does fail for some sparsely structured images such as the and, like most methods, struggles on the largest of holes. bars on the doors : I have cited the resource for this article. Note following Until next time, happy learning!