Data driven algorithms like neural networks have taken the world by storm. Their recent surge is due to several factors, including cheap and powerful hardware, and vast amounts of data. Neural Networks are currently the state of the art when it comes to ‘cognitive’ tasks like image recognition, natural language understanding , etc. ,but they don’t have to be limited to such tasks. In this post I will discuss a way to compress images using Neural Networks to achieve state of the art performance in image compression , at a considerably faster speed.
This article is based on An End-to-End Compression Framework Based on Convolutional Neural Networks
This article assumes some familiarity with neural networks , including convolutions and loss functions.
Image compression is the process of converting an image so that it occupies less space. Simply storing the images would take up a lot of space, so there are codecs, such as JPEG and PNG that aim to reduce the size of the original image.
There are two types of image compression :Lossless and Lossy. As their names suggest, in Lossless compression, it is possible to get back all the data of the original image, while in Lossy, some of the data is lost during the convsersion.
E.g. JPG is a lossy algorithm, while PNG is a lossless algorithm
Notice the image on the right has many blocky artifiacts. This is how the information is getting lost. Nearby pixels of similar colors are compressed as one area, saving space, but also losing the information about the actual pixels. Of course, the actual algorithms that codecs like JGEG , PNG etc use are much more sophisticated, but this is a good intuitive example of lossy compression. Losless is good, but it ends up taking a lot of space on disk.
There are better ways to compress images without losing much information, but they are quite slow, and many use iterative approaches, which means they cannot be run in parallel over multiple CPU cores , or GPUs. This renders them quite impractical in everyday usage.
If anything needs to be computed and it can be approximated , throw a neural network at it. The authors used a fairly standard Convolutional Neural Network to improve image compression. Their method not only performs at par with the ‘better ways’ (if not even better), it can also leverage parallel computing, resulting in a dramatic speed increase.
The reasoning behind it is that convolution neural networks(CNN) are very good at extracting spatial information from images, which are then represented in a more compact form (e.g. only the ‘important’ bits of an image are stored). The authors wanted to leverage this capability of CNNs to be able to better represent images.
The authors proposed a dual network. The first network , which will take the image and generate a compact representation(ComCNN). The output of this network will then be processed by a standard codec (e.g. JPEG). After going through the codec, the image will be passed to a 2nd network, which will ‘fix’ the image from the codec, trying to get back the original image. The authors called it Reconstructive CNN (RecCNN). Both networks are iteratively trained, similar to a GAN.
Figure 2.0 : ComCNN The compact representation is passed on to a standard codec
Figure 2.1 RecCNN. The output from ComCNN is upscaled and passed to RecCNN, which will attempt to learn a residual
The output from the codec is upscaled , then passed to RecCNN. The RecCNN will try to output an image that looks as similar to original image as possible.
Figure 2.2. End to end framework to compress images. Co(.) represents an image compression algorithm. The authors used JPEG , JPEG2000, and BPG.
The residual can be thought of as a post processing step to ‘improve’ the image that the codec decodes. The neural network, which has a lot of ‘information’ about the world, can make cognitive decisions about what to ‘fix’. This idea is based on residual learning , and you can read about it in depth here.
Since there are two networks, two loss functions are used. The first one, for ComCNN, labelled as L1 is defined as:
Equation 1.0 Loss function for ComCNN
This equation may look complicated, but it is actually the standard (Mean Squared Error)MSE. The ||²s signify the ‘norm’ of the vector that they enclose.
Equation 1.1
Cr denotes the output of the ComCNN. θ denotes the trainable parameters of ComCNN, and Xk denotes the input image
Equestion 1.2
Re() denotes the RecCNN. This equation just passes the value of Equation 1.1 to the RecCNN. θ hat denotes the trainable parameters of RecCNN (the hat denotes that the parameters are fixed)
Equation 1.0 will make ComCNN modify its weights such that , after being recreated by RecCNN, the final image will look as close to the real input image as possible.
The second loss function , for RecCNN, is defined as :
Equation 2.0
Again, the function may look complicated, but it is a mostly standard neural network loss function (MSE ).
Equation 2.1
Co() denotes the output from the codec. x hat denotes the output from ComCNN. θ2 denotes the trainable parameters of RecCNN. res() just represents the residual that the network has learned. It is just the output of RecCNN. It is worth noting that RecCNN is trained on the difference between Co() and input image , and not directly from the input image.
Equation 2.0 will make RecCNN modify its weights such that the output of it looks as close to the original image as possible.
The models are trained iteratively, similar to the way GANs are trained. One model’s weights are fixed while the other model’s weights are updated, then the other model’s weights are fixed, while the first model is trained.
The authors compared their method with existing methods, including plain codecs. Their method performs better than other methods, while maintaining high speeds when used on capable hardware. The authors tried using just one of the networks , and they noted that there were performance drops.
Figure3.0 SSIM(Structural Similarity index) Comparison. Higher values indicate better similarity to original. The authors’ work are in bold.
We took a look at a novel way of applying deep learning to compress images. We talked about the possibility of using neural networks on tasks besides ‘common’ ones such as image classification and language processing. This method is not only as good at the state of the art , but it can process images at a much faster rate.
If you found use of this article, please ❤ it or 👏🏻 for it.It means the world to me. If you have any improvements, suggestions, or want me to cover publications that you think are interesting, feel free to respond to this article!.