I was rewriting codebase of our neural network image upscaling service — Let’s Enhance to make it ready for bigger and faster models and API we are working on. As we work with image generation (superresolution, deblurring, etc) we do rely on a typical image processing libraries like OpenCV and PIL. I always had suspicions that it makes sense to use Tensorflow image processing capabilities — in theory, they should be faster. So, I decided to stick to native Tensorflow image preprosessing and dataset building tools using dataset.map() to keep all operations in tensors all around my code.
The problem was quite awful — my new and shiny code for superresolution wasn’t able to reproduce not only any state-of-art networks but even my own code that I wrote 4 months ago. And the ugliest part was that results of superresolution itself were pretty good sometimes, the network was working, although not reaching target PSNR and having strange visual artifacts sometimes, like doubling of small lines.
What initially was looking as minor bug became 60 days of struggle and sleepless nights. My faulty logic was simple — there’s something wrong with network definition or training process. Data preprocessing is definitely fine, as I am having meaningful results and visual control over image processing in Tensorboard.
I tweaked everything I was able to find, defined network using Keras, Slim, raw TF — nothing, looked for changes in TF 1.3->1.4->1.5 and different CUDA versions, paddings behaviors. I am ashamed even to tell you about my latest suspicions, which involved defects in GPU RAM and statics. I was tweaking perceptual losses and style losses looking for a reason. And each iteration took days to retrain until some meaningful result…
Yesterday I found The Bug, when looking into Tensorboard. It was almost subliminal feeling that something is wrong with the image. I disregarded network output and just overlayed target image and input image(that is a downscaled target image) in Photoshop. Here’s what I got.
Looks strange, somekind of displacement happening here. Totally againts any logic, this just can’t be true! My code is dead simple. Read image, crop image, resize image. All in Tensorflow.
Anyway, RTFM. tf.image.resize_bicubic has a parameter — “align corners”. How in the world would you like to downscale image and not have corners aligned? You can! So there is a very weird behavior of this function known for a long time — read this thread. They can’t fix it as this can break lots of old code and pre-trained networks.
Our
tf.image.resize_area
function isn't even reflection equivariant. It would be lovely to fix this, but I'd be worried about breaking old models.
This code does actually displace your image by one pixel to the left and top. Thread suggests that even the interpolation is broken in TensorFlow. It’s 2018, people. Here are the actual downscaling results with TF.
Stick to Scipy/OpenCV/numpy/PIL whatever you prefer for image processing. The second I changed it my network worked like a charm (next day actually, when I saw training results).