Hi! My name is Oleksii Avilov. I work as an ML engineer at Ukrainian startup ZibraAI. I am engaged in research, automation, and the development of ML solutions. I mainly work on image-to-3D generation tasks, and now also text-to-image.
Many 3D artists and illustrators believe that artificial intelligence will eventually replace them. I look at this challenge differently. At ZibraAI, we develop solutions to simplify the work of game designers using artificial neural networks. We don’t want to replace people, but strive to help them by making the most time-consuming and boring processes faster.
For the past few years, I have been fascinated by generative art – the art generated with the help of artificial intelligence.
At the beginning of last year, we worked on 3D model generation and tested different approaches, including text-to-image generation. It started rather as a hobby, and it probably would have remained a hobby, if not for the war.
In this blog, I want to cover different approaches to generating images using artificial intelligence and share the story of how we used generative art to draw attention to the war in Ukraine.
The first attempts to generate images from text began in the mid-2010s, with the appearance of Generative Adversarial Networks (GANs).
Generative Adversarial Network is a system of two artificial neural networks that compete with each other. One network (the generator) generates images based on textual descriptions, while another (the discriminator) evaluates them.
During the training phase, the goal of the generator is to trick the discriminator by creating a synthesized image as similar to the real one as possible. The task of the discriminator is to accurately distinguish real images from synthesized ones.
Here is an example of such a generation created in 2016:
It was a
Later several alternative algorithms appeared that generated images from text queries, but there was no visible improvement in terms of quality compared to GANs. A good overview of image generation from long ago to today can be found here on
Only at the beginning of last year, there came major changes in the field of text-to-image. OpenAi introduced two solutions which, in my opinion, gave the start to the revolution in image generation that we observe now. They were
DALL·E neural network is based on GPT-3, the third generation of natural language processing algorithm from OpenAi, and it has a transformer architecture that extends text sequences with special image tokens, which are then transformed into images by another model (decoder).
Compared to previous solutions, DALL·E showed great progress in terms of synthesized images quality. The level of generalization which characterizes the neural network deserves special attention. Thanks to it, DALL·E can generate samples that it hasn’t seen during training (examples that are missing from the training dataset).
For example, these avocado-shaped chairs have become OpenAI's signature.
CLIP has got less hype upon release, but I believe it made a more significant contribution to the development of text-to-image generation. CLIP perfectly links images to text and consists of two encoders, one for text and one for images.
Unlike DALL·E, the trained CLIP weights have been made freely available by the developers. After that, many CLIP-based text-to-image solutions (like
And since then, something crazy has been going on in this industry.
There’s an ongoing competition between open source solutions from the community of independent developers (like Disco Diffusion, Latent Diffusion, Stable Diffusion) and commercial models of large (some of them not) enterprises (DALL·E 2 from OpenAI, Imagen from Google Research, Midjourney, and others). More and more generative content appears in the world.
Midjourney.
Midjourney.
DALL·E 2 generations:
Image sources in the carousel: [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12].
Imagen generations:
Image sources in the carousel: [1], [2], [3], [4], [5], [6].
Stable Diffusion generations:
Image sources in the carousel: [1], [2], [3], [4], [5], [6], [7], [8], [9].
Here are some
Although big corporations usually have access to more computing power, many generative artists prefer open source solutions. The advantage of such solutions is that everyone can participate in product development and try it, with the technologies themselves evolving and developing quickly.
Moreover, the choice between open source products and commercial solutions is often due to censorship. For example, OpenAI prohibits generating images on sensitive topics and bans users after several attempts to generate them. The word "Ukraine" is also prohibited for text input.
Censorship in DALL·E 2
In the first week of the full-scale war, our team decided to use our expertise to help raise funds to rebuild cities destroyed by Russia and to remind the world that the war was not over. That’s how the Sirens Gallery project was born. . Upon getting started we conducted research and chose the approach based on Disco Diffusion.
At that time, version 4.1 was in public access, and now 5.6 has already been released. Disco Diffusion is based on the class-conditional
Day 1 — Ghost of Kyiv**
We've explored the parameters, available models, and art styles and started generating pictures (you can find a parameters guide
But in fact, the technical part was not the hardest one. The biggest challenge was to make the war timeline.
In order to choose the most important events, and write stories about them, we had to review a lot of events and photos. You can't help but take it to heart when you read about yet another atrocity of Russian soldiers, another story of raped women and children, or look at photos from Bucha or Izyum, where the streets are littered with the corpses of civilians.
In total, Sirens Gallery consists of more than2,000 pictures generated by artificial intelligence based on textual descriptions of the most important war events. 1,991 paintings have been offered for sale as NFTs on the Opensea.io and Paras.id platforms. At the moment, we have already sold part of the paintings with an overall cost of 250,000 UAH (~$7,000). We transferred these funds to three charity projects on Dobro.ua platform (the report is available
Our goal was to draw the world community's attention to the horrors that Russia (the terrorist state) commits in Ukraine. At the same time, we wanted to show the courage, bravery, and humanity of Ukrainians and raise funds to help the victims of the war.
Day 85 — Heroes of Azovstal hold defense of Mariupol for 85 days**
Technologies evolve constantly. Some solutions provide more realistic generations. Since we have started working on Sirens Gallery, Stable Diffusion, Imagen, Midjourney, and DALL·E 2 have been released.
A few weeks ago, the developers
The Stable Diffusion team has also created an API. Based on it, some Photoshop), GIMP, and Blender plugins have already appeared. There is an option with texture generation for
You can look through a large
At Sirens Gallery we haven’t counted on technology perfection, our project is rather about what you can achieve using new technologies. You can find our stories and corresponding AI-generated illustrations on our social networks: Instagram and Twitter. They are also available on the Sirens Gallery project
Day 4 — Russian helicopters get destroyed in Chornobayivka
Recently, together with Save Ukraine, we held the first offline exhibition of AI-generated pictures, which depict the stories of saving Ukrainian children from the war. In the following months, the paintings will be presented on three continents. Afterward, they will be sold in New York. All funds will be transferred to help Ukrainians who became victims of Russian aggression.