Text-to-image generation is not a new idea. Notably, the (GAN) architecture, a once-popular deep-learning computer vision algorithm had . Generative Adversarial Network generated birds and flowers from text After more improvements to the generative image algorithms like and , we now have a deluge of commercial text to image generators, ranging from , and . hyperealistic generation of human faces GLIDE diffusion model Google’s Imagen Tiktok’s Greenscreen OpenAI Dall-E And here comes the new kid on the block, Stable Diffusion Before moving ahead, for those who wants to understand the nitty-gritties of diffusion models, I strongly encourage readers to go through this awesome blogpost by Lilian Weng, and . https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ this tweet by Tom Goldstein The rest of the article would be an exercise of using the stable diffusion model to generate images given “ ”. There is no main purpose for doing so, other than satisfying my curiosity. my name Narcissistic as it sounds, I want to generate an image with my name. Be honest, don’t we all from time to time? And thus, I went to and typed in . “google your own name” https://huggingface.co/spaces/stabilityai/stable-diffusion Liling Tan Before I show the generated results, I would like to clarify some specifics about my first and last name. In general, it is more probable that a English romanization of my Chinese character name refers to a female individual than a male. Next (陳) is a common Southern Chinese name that originates from the , and more commonly you would expect the English translation/romanization of Mandarin Chinese, . With K-Pop being a global phenomenon, the top-ranked Google result of would point you to the . Also, since (my last name), has the same spelling as the color, the top Google search will end up with results pointing to “ ”. Liling Tan Min language Chen Chen Korean Singer from the K-pop band EXO Tan tan a yellowish-brown color For reference, here are the control experiments results, if I googled “liling tan” And now, the results… Okay, that’s definitely nowhere close to how I look. I kind of expected the image to show an Asian female but the first image was kinda weird. Now, what happens if I lowercase and re-run the generation? Alright, the model seems hell-bent on some facial features and sorta generating one older and another younger version of “ ” liling tan Interestingly, the model has some internal mechanisms to block some "unsafe” content and output this error message. This Image was not displayed because our detection model detected . Unsafe content Then I got really curious, do the two versions of “ ” exist IRL (in real life)? liling tan So I do the natural thing to reverse image search with https://lens.google.com/ Image Search Result 1 Image Search Result 2 Hmmm, no results, lets extend the frame to beyond the face… And of course, what was I expecting, surely Google Shopping will take the chance to advertise and sell me something -_- Maybe the older version generated by the model is more grounded to someone IRL? Image Search Result 3 Hmmm, no results, so the model kinda generated two unique people that doesn’t exist IRL? But, we see two separate dots that indicates two other results exists, let’s see the first one at the bottom. Of course, it will try to sell me something again… What was I expecting? @_@ Let’s try the results other dot. Now this is interesting, it’s trying to promote a inspired piece. Girl with a Pearl Earring (ca. 1665) by Johannes Vermeer But what about “Find image source”? Does it really find the source of the images that the model use to slice, dice and “diffuse”? It’s hard to say: Find the image source - Result 1 Find the image source - Result 2 Find the image source - Result 3 Đợi tí (Wait a minute),does that mean that we don’t know which image the model has been trained on or used to splice before generating the results? Now that we find the underbelly from the results, lets backtrack the OG paper listed on the original source code https://github.com/CompVis/stable-diffusion Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer. In CVPR '22. High-Resolution Image Synthesis with Latent Diffusion Models. According to the paper, the model is pre-trained on the LAION database and the Conceptual Captions dataset . https://laion.ai/blog/laion-5b/ https://ai.google.com/research/ConceptualCaptions/ Our model is pretrained on the LAION [73] database and finetuned on the Conceptual Captions [74] dataset. What if, we build a reverse image search based on the datasets? Perhaps if we can find the approximate nearest neighbor images to the outputs generated by the model, then it might be possible to “explain away” which images the model had deemed to be “salient” enough to diffuse and generate the outputs from my name. Voila! Here’s a search engine to probe the dataset used to train Stable Diffusion, https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator/ And here’s the list of . results of searching for Liling in the LAION database Ah ha, that is one image that the model must have “chosen to diffuse” It is highly possible that the model somehow and generated an image similar to that image. “ focused ” an image in the training data with “Liling” in the caption Why did you bother to “ ”? diffuse <your_name> To conclude the post, this exercise is purely out of curiosity. I wanted to know what the model would generate. But this exercise also highlights some sort of bias when using generative models. While “Liling Tan” isn’t to generate anything close to me or any other top-ranked “Liling Tan” search results, the model seems to be more stable in . stable enough generating famous people, (e.g. John Oliver) To end this article, here’s a result from generating images with my online handle as the prompt… alvations Hopefully, after reading, you would also try to “ ”! diffuse <your_name>