5,346 reads

Manipulate Images Using Text Commands via this AI

by Louis BouchardSeptember 7th, 2021

Too Long; Didn't Read

AI has the incredible ability to generate realistic images in pretty much any domain; real-life humans, cartoons, sketches, etc. Then, they could leverage it to take an image and edit it following a specific style, like changing it into a cartoon character or transforming any face into a smiling face. This is extremely complicated as the representation in which we edit the images is not human-friendly. The challenge is to edit only the wanted parts and keep everything else the same. Now, you can do that using only text! Learn more in the video…

featured image - Manipulate Images Using Text Commands via this AI

Researchers used AI to generate images. Then, they leveraged it to take an image and edit it following a specific style, like changing it into a cartoon character or transforming any face into a smiling face. This needed a lot of tweaking and model engineering and many trials and errors before achieving something realistic. There have been many advances in this field, mainly StyleGAN, which has the incredible ability to generate realistic images in pretty much any domain; real-life humans, cartoons, sketches, etc.

StyleGAN is amazing, but it still needs quite a lot of work to make the results look as intended, which is why many people are trying to understand how these images are made, and especially how to control them. This is extremely complicated as the representation in which we edit the images is not human-friendly. Instead of being regular images with three dimensions, red, green, and blue, it is extremely dense in information and therefore contains hundreds of dimensions with information about all the features the image may contain.

This is why understanding and localizing the features we want to change to generate a new version of the same image requires so much work. The keywords here are “of the same image.” The challenge is to edit only the wanted parts and keep everything else the same. If we change the colors of the eyes, we want all other facial features to stay the same.

I recently covered various techniques where the researchers tried to make this control much easier for the user by using only a few image examples or quick sketches of what we want to achieve.

Now, you can do that using only text! Learn more in the video…

Watch the video

References

►The full article: https://www.louisbouchard.ai/styleclip/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
►Patashnik, Or, et al., (2021), "Styleclip: Text-driven manipulation of stylegan imagery.", https://arxiv.org/abs/2103.17249
►Code (use with local GUI or colab notebook): https://github.com/orpatashnik/StyleCLIP
►Demo: https://colab.research.google.com/github/orpatashnik/StyleCLIP/blob/main/notebooks/StyleCLIP_global.ipynb
►OpenAI's Distill article for CLIP: Gabriel Goh, Nick Cammarata, Chelsea
Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, and
Chris Olah. Multimodal neurons in artificial neural networks. Distill, https://distill.pub/2021/multimodal-neurons/, 2021.