Machine Learning can reproduce TEDtalks, Obama speeches and Death Metal band logos. Could it also replicate my husband’s ramblings? About a month ago, I graduated from a at General Assembly (SG). It was a wonderfully gruelling and intense process, and I now feel a tiny bit like Neo from the Matrix. It’s really cool to look at the source of things and be able to make sense of what’s going on. 12-week coding bootcamp Since then, between code katas and job interviews, I’ve had a lot of fun working on . Here’s a story about one of my favourites. little side-projects It all began when I attended a . I had just spent 12 weeks learning some rather practical skills, and I was eager to see how they could be used for mischief. The theme was Machine Learning. Creative Coding meetup There were 2 speakers, and their talks were fascinating. One of them, , wrote about how : Rob Peart he used neural networks to generate death metal band logos Delightfully ominous. (Image: ) Rob Peart Sounded fun — I wanted to play! Getting started (dry run) I followed the installation instructions on , and referred to when I ran into trouble. Like Rob said, training on Shakespeare gets boring pretty quick. I didn’t really want to use images either, so I started looking around. Jeff Thompson’s tutorial Assad Udin’s installation guide Finding other examples to learn from I discovered two projects I really loved — ’s (trained on over 4 million words!) and (>700k words). samim TED-RNN Obama-RNN Now, it just so happens that my favourite human being in the world is 66.2% through . That’s at least 662k words — good enough for what I wanted to do. a project to write a million words Getting started… for real this time First, I exported the blog from Wordpress, getting a XML file with over 60,000 lines. Next, I spent about 2 hours cleaning it up. Removing the metadata was easy, removing formatting from within posts was harder. I got lots of regexp practice! Once that was done, it was training time. Training the Shakespeare data set took about an hour, so I wasn’t expecting this to take much longer… Boy, was I wrong. It took 5 full hours! At long last… Creating my first sample was super quick… and then I got a bunch of disappointing rubbish. “Play I tencid a plistance of good how from a bit of tirely problems we imagine from that 4n.” This didn’t look anything like the results from Obama-RNN… Time to actually read the documentation. Digging into the details In Karpathy’s article on , he explains that setting the temperature changes how a sample is created. RNN-effectiveness Lower temperature will cause the model to make more likely, but also more boring and conservative predictions. Higher temperatures cause the model to take more chances and increase diversity of results, but at a cost of more mistakes. That made a lot of sense. The samples below are 6000-character long chunks of text starting with “Let’s talk about procrastination”, at every temperature setting. Displaying the information in a way that makes sense I wanted a way to visualise what was happening so… word clouds! (No need to reinvent the wheel here, is free and does exactly what it says on the tin.) wordclouds.com temperature 0.1 to temperature 0.3 The conservative samples (1 and 2) were repetitive and boring. The results were long chunks of text, with no punctuation, sentences or paragraphs. In the first sample, the word ‘world’ appeared over 120 times! Sample 3 was the first one to return made-up words (‘confriend’ and ‘boathe’). temperature 0.4 & 0.5 Sample 4 was the first one with paragraphs! There were only 2, and they were ridiculously long 3000-character sentences. There were also about 10 semi-gibberish words, but they still look familiar (windom =wisdom, priviled = privilege, speaked = speak). As we get less conservative, the generator tries more ‘risky’ things. best results at temperature 0.6 & word cloud of the original text (>600k words) Sample 6 is, in my opinion, the best result I got. It has bits that actually sound like a Lorem Ipsum version of Visa. Some of my favourites lines: think about it, I was a teenager and the fact that it is to do things and drink the guy out of the conversations and off the same thing So I suck on things that I wanted to read out of down at great me to that I wanted to get the optimal starts of how to be an only that I should be a conversation in the days of external to truly write a lot of the reality of my playing something outside for the most is a break with seemingly the more less to the end of the real. better in a sense of thinking and building so I feel like I should be writing everything that I love having my responsible of anybody would actually interesting to me I feel like I’ve present for it– and then I wanted to do the same than a communicate early and learn and the problem of realize that you know thinking. I think it’s not really feel good right. some interesting for our curiosity to the world that I want to figure and think about this makes me this possibly love what they get the idea temperatures 0.7–0.9 temperature 1.0 & the resulting output Samples 7–10 are increasingly garbage, and pretty hard to read With 9 & 10, sentences became shorter, choppy and abrupt Punctuation seems a bit random, with question marks and dashes showing up There are more and more gibberish-words — the word clouds really help visualise this! What I learnt, what I loved & what I would do better “Gasp. Language!” I love, love, love that swear words appeared. I’m not sure what this says about me. (“fuck” appeared at temperature 0.3 and “bitch” appeared at 0.5.) I think the results here aren’t as great as TED-RNN and Obama-RNN because those examples learnt from a much more consistent format. , on the other hand, uses this blog to write anything and everything he wants. On retrospect, it might’ve done a better job if I picked certain categories of posts rather than all of them. Visakan I laughed way too hard at this. To be honest, I still can’t entirely explain what a neural is, or how exactly machine works. But I feel like I’m somewhere in the ballpark now, rather than a total outsider wondering what everyone’s on about. network learning Multi-layer Recurrent Neural Networks! Sequential processing! Optimization algorithms! Corpus! My little toy probably isn’t going to be writing blogposts for anytime soon, but it sure was a lot of fun to build and tinker with! Visakan You can check out the complete text samples here and my portfolio here .