paint-brush
Nvidia filling the blanks: A Partial Convolutions Research Paperby@singhuddeshyaofficial
1,877 reads
1,877 reads

Nvidia filling the blanks: A Partial Convolutions Research Paper

by Uddeshya SinghSeptember 14th, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

So, it’s 2018 and Nvidia Researchers are at it again. This time, with a revolutionary image in-painting and essentially, hole filling and quality enhancing algorithm.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Nvidia filling the blanks: A Partial Convolutions Research Paper
Uddeshya Singh HackerNoon profile picture

So, it’s 2018 and Nvidia Researchers are at it again. This time, with a revolutionary image in-painting and essentially, hole filling and quality enhancing algorithm.

Comparison to other techniques

This research paper (arXiv:1804.07723v1 [cs.CV]) came out on 20 April 2018. It focuses it’s point on the fact that recent in-painting approaches which do no use deep learning, use image statistics of the remaining image to fill in the hole.

Patch Match , one of the state-of-the-art methods, iteratively searches for the best fitting patches to fill in the holes. While this approach generally produces smooth results, it is limited by the available image statistics and has no concept of visual semantics.

PatchMatch was able to smoothly fill in the missing components of the painting using image patches from the surrounding shadow and wall, but a semantically-aware approach would make use of patches from the painting instead.

Deep neural networks learn semantic priors and meaningful hidden representations in an end-to-end fashion, which have been used for recent image in-painting efforts. These networks employ convolutional filters on images, replacing the removed content with a fixed value.

Other techniques like Iizuka et al. uses fast marching and Poisson image blending , while Yu et al. employ a following-up refinement network to refine their raw network predictions. Another limitation of many recent approaches is the focus on rectangular shaped holes, often assumed to be center in the image. We find these limitations may lead to over-fitting to the rectangular holes, and ultimately limit the utility of these models in application.

In order to focus on the more practical irregular hole use case, we collect a large benchmark of images with irregular masks of varying sizes. In our analysis, we look at the effects of not just the size of the hole, but also whether the holes are in contact with the image border.

What’s different in this model?

The researchers have proposed the following modifications to the standard U-Net like structures.

  • Use partial convolutions with an automatic mask update step for achieving state-of-the-art on image in-painting.
  • While previous works fail to achieve good in-painting results with skip links in a U-Net with typical convolutions, they demonstrate that substituting convolutional layers with partial convolutions and mask updates can achieve state-of-the-art in-painting results.
  • They have proposed a large irregular mask dataset.

The researches gave a rather lucid reason for the mask update . Allow me to mention it here as it is:

To properly handle irregular masks, we propose the use of a Partial Convolutional Layer, comprising a masked and re-normalized convolution operation followed by a mask-update step. The concept of a masked and re-normalized convolution is also referred to as segmentation-aware convolutions in for the image segmentation task, however they did not make modifications to the input mask. Our use of partial convolutions is such that given a binary mask our convolutional results depend only on the non-hole regions at every layer. Our main extension is the automatic mask update step, which removes any masking where the partial convolution was able to operate on an unmasked value. Given sufficient layers of successive updates, even the largest masked holes will eventually shrink away, leaving only valid responses in the feature map. The partial convolutional layer ultimately makes our model agnostic to placeholder hole values.

The Model Approach and Architecture

The proposed model uses stacked partial convolutional operations and mask updating steps to perform image in-painting. Let’s start with defining convolution and mask-update mechanism.

For brevity, we refer to our partial convolution operation and mask update function jointly as the Partial Convolutional Layer.

Let W be the convolution filter weights for the convolution filter and b its the corresponding bias. X are the feature values (pixels values) for the current convolution (sliding) window and M is the corresponding binary mask. The partial convolution at every location, similarly defined in , is expressed as:

Partial Convolution mechanism

After each partial convolution operation, we then update our mask. Our unmasking rule is simple: if the convolution was able to condition its output on at least one valid input value, then we remove the mask for that location. This is expressed as:

Mask Update Scheme

and can easily be implemented in any deep learning framework as part of the forward pass. With sufficient successive applications of the partial convolution layer, any mask will eventually be all ones, if the input contained any valid pixels.

Network Design

The network design is largely based on UNet like architectures using just one minor tuning, which is replacing all convolutional layers with partial convolutional ones.

The network architecture

Elaborating about the network architecture, it is important to mention that PConv 1 to PConv 8 is the encoding network and the following ones having UpSampling skip links is the decoding architecture of the same.

The BatchNorm column indicates whether PConv is followed by a Batch Normalization layer. The Non-linearity column shows whether and what non-linearity layer is used (following the BatchNorm if BatchNorm is used).

Loss Functions

From the excerpts of the research paper:

Our loss functions target both per-pixel reconstruction accuracy as well as composition, i.e. how smoothly the predicted hole values transition into their surrounding context.

Given input image with hole I_in, initial binary mask M (0 for holes)the network prediction I_out, and the ground truth image I_gt, we first define our per pixel losses L_hole = k(1−M)⊙(I_out −I_gt)k1 and L_valid = kM ⊙(I_out −I_gt)k1. These are the L1 losses on the network output for the hole and the non-hole pixels respectively.

Perpetual Loss has been calculated using :

where Ψn is the activation map of the nth selected layer.

Perpetual Loss

While the style losses has been taken into considerations and used as :

Style Losses

Our final loss term is the total variation (TV) loss L_tv: which is the smoothing penalty on P, where P is the region of 1-pixel dilation of the hole region.

Smoothing Penalty

So, the Total loss (after coefficient hyper parameter tuning) comes out to be:

Total Loss

Testing and Results in Hole Filling

Comparisons among PConv

One can easily have a look at the researchers result (PConv) and see how it fares against other models using 256*256 pixel dimensions.

Important thing to note is that ImageNet and Places2 models train for 10 days, whereas CelebA-HQ trains in 3 days. All fine-tuning is performed in one day.

So, you can see the amount of time it takes to train such models even after using NVIDIA V100 GPU (16GB) with a batch size of 6!

Model Benchmarks

I would like to acknowledge the time sensitivity while carrying out the precision results as a personal opinion. Considering that, this PConv model is really convincing as a perfect alternative in both L1 scores and IScores.

The researchers have claimed that:

Our method outperforms the other methods in most cases across different time periods and hole-to-image area ratios.

Graphical Benchmarks

and looking at the graphical benchmarks, I would not really argue.

Other Uses

Image super resolution task

Resolution Enhance Task Results

Yes, this algorithm can be used to enhance image resolution too. Let the following image tell it’s own story. I will leave this one as mystery 😉

Mask Updates with one-one mappings

Limitations

I would like to cite the research paper to state that the model, in itself won’t be a victim of catastrophic performance degradation as holes increase in size, but it does fail for some sparsely structured images such as the bars on the doors and, like most methods, struggles on the largest of holes.

Note : I have cited the following resource for this article.

Until next time, happy learning!