A documentation of my side project’s step by step trial and error.
I was walking around San Francisco MoMA with my fiancee and entered a room with a massive awe inspiring mosaic. Jaw dropping.
This thing was massive. It covered just about the entire wall. The image grew in detail the further you walked away from it, but broke up into intricate abstract shapes as you approached.
I wondered how I’d look like in a mosaic. Vain thought, I know. But why stop there, how would your loved one, pet, car, even food look like if it was drawn in such a fashion?
Is it possible to create something similar with an algorithm? Can we abstract a similar process and apply it to arbitrary subjects? I wanted to create something that could produce images I could hang on my wall but for others to utilize creatively for themselves as well. Whether it’s creating a mosaic selfie or putting an artistic spin on their favorite photo that they took. Bring a process to the hands of a wider audience.
I’m going to beat around the bush and not mention the artist’s name that was the source of inspiration because I’ve found articles online claiming he does not want his name referenced in personal projects, so I’ll respect their wishes. However, you astute readers should know the source of my inspiration. Let’s hop into some nitty gritty.
With the recent leaps and bounds of machine learning, particularly deep neural networks, I figure I start there. Let’s use a deep neural network to extract the style of an artist’s painting and then apply those features to an arbitrary photo. In my case, I chose a random selfie taken during a cold afternoon. I used a popular open source project based off neural-style. The end results were interesting albeit disappointing.
I welcome others to try. Perhaps you’ll have better results and I used improper parameters. I also don’t own a GPU (or bothered to lease servers in the cloud). Training the neural networks using my MacBook Pro on a low resolution image took half an afternoon before I was able to see results. Aint nobody got time for that!
Instead of utilizing machine learning, let’s try to define an algorithm. Grid the picture, average the cell’s colors, then draw a bunch of nested shapes with various colors that are formulated from the average of the cell’s colors. This won’t be too bad (famous last words). I’ve been coding just shy of two decades and I still need to slap myself because it’s still too often that I take things for granted.
I litter some code throughout the article that I use within my project. If you’re not interested in them, skip and ignore them. Understanding them is not imperative. Although those with coding backgrounds may find interesting (or not).
On my first iteration, I split the image into a grid and using Pillow, I draw distorted nested circles. Each nested circle, is either red green or blue (RGB). The shade of RGB was generated from the average color of the cell.
When the red, green, blue nested circles are drawn together, we get the appearance of an average that is close to the color of the original cell. The distortion / waviness of the circles was getting ahead of myself by focusing on the details than the big picture.
Not too pretty, and very blurry. Okay, how about if I randomize the drawing of each nested circle, but still base it off the average of the original cell?
A smidgen more interesting, but still disappointing. You can still make out the picture, but still blurry. We need to sharpen the outlines to give some clarity to the picture. What if we stretch the circles and orient it with the slope of the outline? This will mimic an outline even though we’re still drawing circles.
To get edges and outlines of an image we can use canny edge detection.
To find the slope we can use numpy’s polyfit to get the first order polynomial. Now that we have the edges in a 2d array, we can get the slope of these edges. We are attempting to find the coefficient m in y=mx+b.
A quick test:
>>> cell = np.array([[0,0,1],[0,1,0],[1,0,0]])
array([[0, 0, 1],
[0, 1, 0],
[1, 0, 0]])
For easy eye-spotting debugging purposes, I highlighted the circles as red to indicate a positive slope and green being a negative slope. We can see that the circles roughly follow the slope of the edges.
However, integrating the directional circles didn’t help much (as you can see below on the far left image). The below set of images of Bill Murray’s portrait (I was watching Lost in Translation during the holidays which is why I chose his image) shows some attempts and experimentations. Nothing really stuck.
Okay, back to the drawing board. Let’s take a different approach. Keep it simple, none of these squiggly circles and rotated circles that match the slope of outlines. Let’s draw nested rectangles and triangles. When we encounter an outline / edge, we draw the shape with two colors instead of only one color of the cell’s average color. This will give the effect of an outline.
The above images displays the progress as we get closer and closer to the end goal. We have the glasses clearly outlined that almost gratifyingly fit together like puzzle pieces.
To determine which shape to draw in each cell, we try various shapes and measure how similar this shape is to representing the original image. “Likeness” is quantified with the root-mean-squared difference. The shape with the lowest difference gets drawn.
What about color? Like various tints of our average color, we can select a variety of colors that when juxtaposed should give the perception of our original color.
Using the average color of a cell, we can generate additional colors to use for our nested shapes.
I learned that although colors whose RGB summed to equivalent values will be perceived with different brightness levels because our eyes aren’t sensitive to colors equally. This is known as relative luminance. We can see from the middle image above that certain colors (ex. green) tend to overwhelm others.
To offset this over powering color, we tint the complementary colors to the luminance of our original color.
Alright, we have some base algorithms going. Let’s apply this to a few images. It’s worth noting that the original images by themselves are
There’s some additional creative things like combining multiple frames with various settings to create gifs:
More experimentations with colors and color palettes. I am pretty happy with the shapes being drawn. The next step is to continue experimenting with various colors and perhaps generate preset palettes. By converting an image into a mosaic you naturally remove detail from the photo. Thus, what holds up an image is usually going to be a dynamic range of colors and contrast. It would be great if given an image that doesn’t have such ranges that I can provide the user a set of color palettes to enhance the image prior to applying the mosaic effect.
Performance updates. snakeviz has been invaluable in profiling functions. The initial implementation took 3–5 minutes to process a 3000x3000 px image. It is now brought down to a rough 30–60 seconds dependent on the size of the grid. This is always an ongoing effort.
Retrospective learnings from the project
How master artists are able to intuitively select colors and do this by hand is nothing short of amazing. Mind blowing, really.
Now the shameless plug. Like what you see and want to create your own mosaics?
Get the (still rough) Python code on github