In fast.ai’s Deep Learning Part 2 with @jeremyphoward and @math_rachel we’ve been learning about generative models, which inspired me to experiment in some new directions. I’ll show you the results of these experiments here and describe how they were done. For complete comprehension, you should be familiar with CNNs, loss functions, etc; if you’re not, check out Practical Deep Learning For Coders, Part 1 (the MOOC version of part 1 of this course).
Ever since Gatys and friends introduced artistic style-transfer, the deep learning community has been stoked about creating images in the style of famous artists. And for good reason! The results have been consistently stunning; style-transfer has allowed us to redraw images with neural networks in ways that simple filters could not hope to imitate.
For those unfamiliar, artistic style-transfer is a method of creating an image with the content of image X but with the style of image Y. If the words “convolutional neural network”, “MSE”, and “back-propagation” are gibberish to you, feel free to skip the next three paragraphs (but keep an eye out for my upcoming complete non-technical guide to style transfer!). If you are familiar with these topics, the following is a brief overview of the original style-transfer implementation.
The original approach by Gatys starts by taking three images, the content image, style image, and a random “noise” image, and puts them through the convolutional layers of some CNN that’s been pre-trained for image recognition (typically Vgg16/19). The loss function for this method is to compare convolutional layer outputs for both the content and style images to the random image. Then, we update the random image’s pixels through back-propagation to minimize this loss.
The metric for “comparison” differs for the content and style image; for content, our loss is just the MSE between the content and random image outputs. This updates the random image to eventually match the content image. Exactly how strictly it matches the original image depends on how deep of a layer you use for comparison.
The revelation reached in the original paper lies in the loss computed for the style; instead of computing the MSE between convolutional outputs for the style/random images, we compute the MSE between the Gramian matrices of these outputs. Back-propagation on this loss results in an image that has the style of the original image, with none of the content. This method essentially deconstructs the style image into one that consists of raw artistic style, with no structure. I like to refer to this as the style image’s palette.
Style-transfer is simply the combination of both of these approaches, which allows us to create an image with the content of one picture and the style of another.
In the above example, my content image is Kanye West, and my style image is Pablo Picasso’s “Girl Before a Mirror”. The image on the bottom left is an example of the palette deconstructed from the painting. The last image is the result of combining content and style loss when reconstructing the image; we’ve rebuilt the image of Kanye with the style of Picasso.
And yet, the phrase “artistic style-transfer” slightly underestimates what we can actually accomplish with this method. There are many examples in the literature of transferring the style of works of art to images; however, there are few examples of transferring non-artistic styles to images.
Attempting to apply non-artistic “styles” to an image is a natural extension of the artistic style-transfer approach. For example, we might wonder if we could apply the “style” of a guitar to an image.
This implementation is… mediocre at best. We can see that our resulting image does show some of the fretboard style, but it’s not particularly strong and is relatively smoothed out. When we look at the palette that’s extracted from this image, the fretboard pattern isn’t prevalent.
This could simply be due to a poor choice of image. Perhaps this will work with other non-artistic images:
Nope. Neither the NES controller or Captain Picard image were capable of imparting a “NES controller/Picard” style to these images. As with the guitar, the palettes extracted from these images don’t exactly capture what we want them to.
If you have some experience with style-transfer, this will come as no surprise. When we say “style” in style-transfer, we are referring to the style in which the image is drawn. In these three images, there is no “style” in which these images were drawn. With Picasso’s Girl Before a Mirror, the style we replicate consists of the brush strokes and impressionistic constructions that the painting consists of. Thus, in order for style-transfer to extract a style, it must be something that is prevalent throughout the image.
How can we construct an image that consists of the non-artistic style we wish to transfer?
Just tile it.
This approach disseminates the style we want to transfer much more prominently then just applying the single image. The guitar image clearly displays the polygonal patterns and lines that we might attribute to a fretboard. The NES controller image displays patterns similar to the rectangular structures on the D-pad, as well as the bold red buttons.
The reason this approach works is because it allows us to reverse-engineer a palette that we want to draw a new image with. Since our goal is to draw an image in the style of a NES Controller, we need an image that has been drawn in that style in order to extract that palette. However, since such an image is exactly what we’re after (and likely does not exist), we need to manufacture a synthetic example of this ourselves. Tiling does exactly this.
But exactly how far can we take this? Can we use this tiling approach for a “style” that is something like a face?
This is much better, we can see that the palette is picking up portions of a face as the style. But it’s losing the structure of the face as a whole. If we think of the palette as the brush strokes that created the style image, then intuition should tell us that we can tune how much of our desired image constitutes a “brush stroke” by tuning the granularity of the tiling. Thus, if we want the entire face to be captured by a brush stroke, we should make the tiles even smaller.
Holy smokes. That is clearly an image that is constructed with the style of Captain Picard’s face!
Looking at the palette for the tiled image, we can see that the smaller tiling gives a style that is much better at transferring the entire face, without losing some of the abstraction we desire. If we tiled the guitar and NES controller images in the same way, we’d likely get similar results.
I find this tiling approach fascinating because it pushes style-transfer outside the realm of fine art. While there’s no limit to what we can accomplish with artistic style-transfer, non-artistic style transfer opens the door to infinitely more possibilities.
If you’re interested in easily experimenting with these techniques, (and possibly drawing your friend’s face with Justin Bieber), I recommend Anish Athalye’s python implementation of neural-style transfer.
If you really want to understand how this works, keep an eye out for Jeremy Howard’s lecture on the subject in his Practical Deep Learning For Coders, Part 2 course, the MOOC of which will be live here in May, 2017.
If you have no experience with neural networks, and would like to learn more about the tech behind the current AI surge as well as how to implement it, I highly encourage you to look at Jeremy Howard and fast.ai’s Practical Deep Learning For Coders, Part 1 course (Full disclosure: I am an intern with fast.ai and primary contributor to course notes for Part 1).
Create your free account to unlock your custom reading experience.