Before diving into details let me explain quickly what this project is about. I come from the generation that watched while growing up. As any kid in those days, I was inspired by animated characters becoming real in my imagination through their artistic expression. One of more recent inspirations are , the virtual band and many others. Robotech Kizuna AI Gorillaz For some time now I was trying to find the shortest, laziest path of creating a virtual Youtuber with technology that's available to everyone, right at this moment. It seems plausible to have an AI create video content on-demand in a matter of minutes with good enough results to demonstrate what I believe will be perfected by the generations to come. Some Early Failures After trying out the same idea with and GPT2 I didn't get the results that were coherent enough without constantly tweaking and adjusting it. project AIYA Even then, it sounded bad and took a couple of hours of work for every video. That's just not a viable workflow! I wanted something quick, intuitive and creativity boosting, some new uncharted territory! But like always first tries are doomed to fail. You can see how that first try went on a youtube video and if you don’t mind hacky code, you can check the github . here repo Now, that GPT3 is finally here I got excited after seeing all the demos and examples of what it can do. But with it being accessible only to those who OpenAI allows (despite the name), I had to look elsewhere. Luckily, I found which is an open-source replica of GPT3's architecture, and chose it as the next best text-generator alternative. This is also possible thanks to the good people at who provided the updated to the public. GPT-Neo Huggingface transformer model Unlike my older project that had thousands of lines of code, this one is done with not more then 400 lines of Python. That being said. How Does it Work Prepared ahead of time: Static image for the video background Some subject I wanted to cover in the video as the main theme It could be another video in the background like some gameplay or animated gifs, but I chose to stick with static background for now. I would write a of the video and let GPT finish creating the rest. I decided to generate up to 600 characters, which is fine for a three-minute video. quick prompt or a question on the subject After generating the text, I used Text-to-speech to and give Arty her voice. With some tweaking of pitch and sample rate, added randomized patches of silence, she gets to have a bit more pleasant and dynamic character voice. convert text to audio Here Google-Cloud-Speech kicks in. The audio file I just created is sent to Google Cloud to for each word. Hint, for audio longer than one minute Google asks you to upload the file to GC Storage first and pass the bucketed link. get the transcription with timed captions Some random noise is always added to the process just by the fact that this transcription process is also probabilistic and it returns different confidence levels and slightly different transcription every time. There are a lot of bloopers and funny, misheard words that make it almost charming. It's a feature, not a bug! Or so I choose it to be! After getting the transcript, now it's time to from the text with an exact notation on when those words appear in audio. Why? You will see. extract some nouns and verbs Here I used another amazing open-source NLP tool called to . SpaCy detect and extract phrases from the generated text Now, with a nice list of some (misheard) phrases and their timing, let's search and download images based on those terms. We will also sample and randomize it every time so that each search produces different images. - to overlay the correct meme at the exact time it was uttered in a video. In the end, you get a nice understandable video with audio of the character talking and almost human-like meme edits, randomly chosen from the internet. Voilà! These images now are added and mixed with the audio file using lengthy FFMPEG command and timed captions Be careful it may catch you by surprise! Just like misheard lyrics are quirky, if you chuckle on some memes, you may be tricked to think Arty is the real deal, at least for a split second. These imperfections are the that give humour to her videos. secret souce Can it drag you into its story and then make you laugh? You ! decide The whole procedure of creating a video from a prompt is automated and takes about 10-15 minutes on Google Collab with a good GPU or TPU (recommended). It takes an image and some starting sentences... and you get a youtube ready video! As a final twist and a feature - you can provide and - and get a brand new, original video in a matter of minutes, completely replacing the AI part with you own content. your own voice short text The next step will be to get the subjects of Arty’s videos chosen by the people of Twitter! Final Words Anyone can experiment with these technologies and create his own videos, add unique editing style, tell a better story or just perfect the currently available toolset for even better, more creative results. Follow Arty on for most recent videos! Youtube Follow Arty on ! Twitter