Deep learning has supercharged ‘cognitive’ tasks such as vision and language processing. Even Google has switched to . One possible reason is that it does not require domain specific knowledge to obtain state of the art results. Also, immensly parallelised hardware like GPUs , coupled with well designed frameworks like TensorFlow , have given rise to the AI Revolution. This post talks about another such ‘cognitive’ task: Coloring black and white photos using . neural network based translation deep learning This article is based on a fairly recent paper : https://arxiv.org/pdf/1603.08511.pdf This articles assumes basic knowledge about and . neural networks loss functions The Task The task is fairly simple: Take a black and white photo , and produce a coloured version of it. Intuitively , the idea is straightforward. Depending on what is in the picture, it is possible to tell what the color should be. E.g. The leaves of trees are generally green, the sky is blue, clouds are white, etc. All that is needed to be done is to make a computer be able to do this. Previous Work, and Problems Previous works have used deep learning. They used regression to predict the colour of each pixel. This , however, produces fairly bland and dull results. Figure 1.0 Results of a regression based model. Input to the model. Output of the model Left: Right: Previous works used Mean Squared Error (MSE) as the loss function to train the model. The authors noted that MSE will try to ‘average’ out the colors in order to get the least average error, which will result in a bland look. The authors instead pose the task of colorising pictures as a classification problem. The Model’s Input and Output The authors used the LAB colour space (the most common color space is RGB). In the LAB scheme, the L channel records the light intensity value, and the other two channels record the color opponents green–red and blue–yellow respectively. You can read about LAB in detail . here One good reason to use LAB color space is that it keeps the light intensity values separate. B/W pictures can be considered to be just the L channel, and the model wont have to learn how to keep light intensities right when it makes predictions (it will have to do that if RGB is used). The model will only learn how to colour images, allowing it to focus at what matters. The model outputs the AB values, which can then be applied to the B/W image to get the coloured version. The model itself is a fairly standard convolutional neural . The authors did not use any pooling layers, and instead chose to use upsampling/downsampling layers. network Color Quantisation As briefly mentioned above, the authors used a classification model instead of a regression one. Therefore, the number of classes need to be fixed. The authors chose 313 AB pairs as the number of classes. Even though this may seem like a very low value, they used methods to ensure more color values are possible (which will be discussed later in this post). The Loss Function The Loss function that the authors used was the standard Cross Entropy. Z is the actual class of a pixel, while Z hat is the output of the model. Standard Cross Entropy Equation 1.0 The authors also noted that there would be class imbalances for the colour values. Cross entropy is a loss function that does not play very well with class imbalances, and usually classes that have fewer examples are given a higher weight. The authors noted that desaturated colors like gray and light blue are abundant compared to others, because of their appearance in backgrounds. Therefore they came up with their weighing scheme. Weighing scheme Equation 1.1 The authors calculate ~p , which is the distribution of classes, from the ImageNet database. Remember that Q is the number of classes (313). The authors used λ value of .5 worked well. Note that the authors smoothened the distribution ~p , but I will skip the details here. If you are interested, you can read it in the original paper. So after taking into account the weight, the final loss function looks like: Loss function after weighing scheme Equation 1.3 The new term v() is just the value of the weight for each of the classes. h and w are the height and width of the image , respectively. Predicting Colours from classes Using a class number of 313 directly to color images would be too coarse. There are simply too few colours to realistically represent the real range of colors. The authors used a post processing step in order to get a more diverse colour range from the model’s predictions. Calculating color from class probabilities Equation 2.0 H is a function , and Z is the output of the model. T is a hyper-parameter that the authors experimented with a few different values. Effect of different values of T Figure 1.0 The reason why this is a good step is because the model’s output will have very valuable information about class probabilities. Instead of just taking the class that has maximum probability(like we do in image classification), the above function tries to utilise the information present about the entire distribution of probabilities of the model output. The training framework Training such a network is divided into two parts. First , the data is passed through the model (forward pass) , then the final prediction is calculated. To calculate the loss, the inverse of H is calculated. Results The results are much more vibrant, and in most cases, quite close to real. Notice that it many times it is not exactly the same as the ground truth, but it is still semantically correct(the model colors the right objects with the right color). Comparison between different colouring frameworks Figure 2.0 Colouring Legacy Black & White Photos Colouring photos that were originally taken in Black & White Figure 2.1 Conclusion In this article , we discussed a novel way to colorise images using a modified loss function. We talked about how vibrancy can be controlled using hyper parameters, and why class rebalancing plays an important role in colorising natural images. Call to Action If you liked this article, for as long as you think this article is worth it. I am always looking for feedback to improve my articles. If you have suggestions or questions, feel free to respond. hold that clap icon

Google

Colorising Black & White Photos using Deep Learning

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

CAN (Creative Adversarial Network) — Explained

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

CAN (Creative Adversarial Network) — Explained

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps