659 reads

This Neural Network Paints Depressing Russian Cityscapes

by ferluhtJanuary 8th, 2021

Too Long; Didn't Read

Neurussia is a neural network that paints Russian landscapes based on Soviet architectural heritage. The project was based on the largest Russian social network — VK. I used the famous Nvidia architecture StyleGAN2 with discriminator augmentations. I got around 10k images of Russian landscapes from VK and used a python wrapper for data scrapping. The official paper implementation uses the.tfrecord format to store data for training. When running on Colab this feature can drastically reduce the number of images that you could process.

Companies Mentioned

featured image - This Neural Network Paints Depressing Russian Cityscapes

Soviet architectural heritage is very disgusting and romantic at the same time. In every city, from Vladivostok to Kaliningrad, there are the same plattenbau buildings. This makes a lot of people feel this panel architecture a part of their cultural code and nature — depressing and chthonic.

Here in Russia, we even have a popular slogan “Russia’s for Sad” (which is also wordplay with right-wings “Russia’s for Russian”). People produce a lot of sad content (music, footage, art) based on these architectural references. So once I thought: what if a neural network would be trained to paint this Sad Russia. What if a soulless algorithm may draw landscapes that are soul pleasure for a generation of Russian doomers. That’s how the project Neurussia was born.

Part 1: Data mining

The choice of a data source fell on the largest Russian social network — VK. And here is why: VK has a post suggestion mechanism, where people can suggest their own content to be published in large groups. This leads VK to contain a lot more thematic visual content than other social networks. For example, the largest group about the aesthetics of Russian ghettos contains around 200 thousand images! Here is what I got initially from VK:

For data scrapping, I used a python wrapper for VK API. In case one would like to reproduce my pipeline, here are API methods descriptions, and below is an example of how to download images from the latest group post.

import os
import vk_api
import urllib.request

def captcha_handler(captcha):
    print (f"url: {captcha.get_url()}\n")
    key = input("Enter captcha code: ")
    return captcha.try_again(key)

def auth_handler():
    code = input("Enter 2FA code: ")
    return (code, True)

def save_post_pictures(post, imres): 
    for attachment in post['attachments']:
        if attachment['type'] == 'photo':
            photo = attachment['photo']
            for size in photo['sizes']:
                if size['type'] == imres:
                    url = size['url']
                    filename = ('_'.join(url.split('/')[-2:])).split('?')[0]
                    urllib.request.urlretrieve(url, filename)

phone = input("phone ")
password = input("password ")
domain = 'yebenya' # vk.com/yebenya

sess = vk_api.VkApi(phone, password, captcha_handler=captcha_handler, auth_handler=auth_handler)
sess.auth()
api = sess.get_api()

posts = api.wall.get(domain=domain, count=1)['items']
save_post_pictures(posts[0], 'z') # https://vk.com/dev/photo_sizes - z max size

To filter downloaded images I used EfficientDet and NLTK. The idea of using a NN detector is to find all objects that it can find and throw out those images leaving only landscapes footage. Natural Language Toolkit helped me to filter out posts by their captions using stemming. Finally, I got around 10k images of Russian landscapes.

Part 2: StyleGAN training

I used the famous Nvidia architecture StyleGAN2 with discriminator augmentations. Authors claim that augmentations allow better results on small datasets (under 30k images). Here is the official code, and here is my “duct tape” modification for Colab training and video generation (will expand on it below).

The official paper implementation uses the .tfrecord format to store data for training (how to convert images to .tfrecord is described here). When running on Colab this feature can drastically reduce the number of images that you could process because the official code doesn’t use any compression in tfrecords.

However, there is another modification that uses compression and probably should allow you to work with larger datasets on Colab (I didn’t try it yet).

On Colab, each epoch is taking about half an hour for a dataset of 10k images in 1024 resolution. Fun fact: on the very first epochs the colormap has rapidly transformed from random bright colors into 50-shades-of-gray (above picture). After a few epochs, I got something like this:

Part 3: Video generation

When the first impressive results have achieved an idea to create a music video for my friends-musicians emerged. But how to make the video at least a bit consistent with music? Here the latent space of the generative model may help.

The thing is, during training, the neural network has learned how to describe a whole picture with a vector of dimensionality 512. And all the parameters of the final painting are embedded in this vector. So one may find a combination of those parameters that correspond to, for example, time of day, or height of the building, or the number of windows. This may be a complex task itself (however, there are some approaches), so for my purposes, I decided to simply interpolate between images keyframe-to-keyframe using their latent vectors. For this purpose, I changed only a few lines of code in generate.py to make it read a list of keyframes with corresponding latent vectors and generate N intermediate frames linearly interpolating between them with 60fps.

Conclusion

With the advent of such approaches as StyleGAN, generative art received a new round of growth. Some are even selling generative art on Christie’s auction without writing a line of code. Also, Colab drastically reduces the entry-level and the amount of time to make something interesting work. Hope that this article also will help someone to expand their creativity using modern AI methods.