From “What is a Markov Model” to “Here is how Markov Models Work”

To be honest, if you are just looking to answer the age old question of “what is a Markov Model” you should take a visit to Wikipedia (or just check the TLDR 😉), but if you are curious and looking to use some examples to aid in your understanding of a Markov Model is, Markov Models Matter, and a Markov Model stick around :) what why how to implement Show > Tell “In , a is a used to randomly changing systems where it is assumed that future states depend only on the current state not on the events that occurred before it (that is, it assumes the ). Source: TLDR: probability theory Markov model stochastic model model Markov property https://en.wikipedia.org/wiki/Markov_model Roadmaps are great! Check out this for this article’s roadmap 🚗 table of contents Table of Contents A. Intro To Markov Models 👶 1. Starter Sentence2. Weighted Distributions 3. Special Additions4. a Markov Model Works5. Full Example Summary How B. Further Markov Model Topics 😼 1. Larger Example2. Distribution 3. Bigger Windows C. Implementation - Python 🐍 1. Dictogram Data Structure 2. Hash Table Data Structure 3. Markov Model Structure 4. Parse Markov Model D. Further Readings, Suggestions, Thoughts 💚 1. Applications2. Further Reading3. Final Thoughts Intro To Markov Models (☝️ 🐠 ✌️ 🐠 ⭕️ 🐠 🌀 🐠) | Definitely the best way to illustrate Markov models is through using an example. In this case we are going to use the same example that I was also presented when learning about Markov Models at . 1. Starter Sentence Make School “One fish two fish red fish blue fish.” -Dr. Seuss 🎩 | Before we jump into Markov models we need to make sure we have a strong understanding of the and . 2. Weighted Distributions given starter sentence, weighted distributions, histograms Starter Sentence Cool, our starter sentence is a well known phrase and on the surface nothing may explicitly jump out. But we are going to break it down and look at what this exact sentence. Specifically, it consists of eight words ( ) but only five unique words ( ). composes tokens keys Colored Starter Sentence Here I gave each unique word ( ) a different color and on the surface this is now just a colored sentence…but alas, there is more meaning behind coloring each differently. By coloring each unique differently we can see that certain appear much more often than others. key key key keys Distribution of keys By looking at the above of we could deduce that the fish comes up 4x as much as any other This type of statement can led us to even further predictions such as if I randomly had to pick the next word at any point in the starter sentence my best guess would be saying “fish” because it occurs significantly more in the sentence than any other word. 💯In our situation are the percentage that one key will appear is based on the total amount of times a key shows up divided by the total amount of tokens. For example, the weighted distribution for fish is 50% because it occurs 4 times out of the total 8 words. Then One, two, red, blue all have a 12.5% chance of occurring (1/8 each). distribution keys key key. weighted distributions are a way to represent weighted distributions, often they are a plot that enables you to discover the underlying frequency distribution of a set of continuous data. In our case the continuous data is a because a sentence consists of many words (continuous data). Histograms sentence Histogram for our starter sentence By looking at the of our starter sentence we can see the underlying distribution of words Clearly, fish appears more than anything else in our data set 🐠🐟 🐡🐠 histogram visually 👀 | Great! At this point you should be comfortable with the concept that our consists of many and Additionally, you should understand the relationship between a and . 3. Special Additions sentence tokens keys. histogram weighted distributions ✅💯✅ ✅💯✅ Extra Example A is any word in the sentence.A is a unique occurrence of a word. there are two and five The keys are “Fish” and “Cat” (🐠 and 😺). Then any word is a token.A is related to because a histogram visually shows the frequency of data in a continuous data set and in essence that is demonstrating the weighted distribution of the data. token key Example: “ Fish Fish Fish Fish Cat” keys tokens. histogram weighted distibutions ✅💯✅ ✅💯✅ End Cool, so now we understand our sentence at the surface and how certain words occur more than others But before we continue we need to add some special additions to our sentence that are hidden on the surface but we can agree are there. The and of the sentence… Start End Starter Sentence with Special Additions Awesome! Take a moment and check out the above “additions” to the sentence that exist. This may seem unnecessary right now, but trust me, this will make exponentially more sense in the next part where we dive into Markov models 😌. In summary, every sentence proceeds by an invisible “*START*” symbol and it always concludes with an “*END*” symbol. Distribution of Keys in Modified Example Above, I went ahead and recreated the same distribution of keys from earlier but included our two additional keys (*START* and *END*). | Fantastic! You already may have learned a few things, but now here comes the meat of the article. Lets start from a high level definition of a Markov Model is (according to ): 4. a Markov Model Works How What Wikipedia “A is a stochastic model used to model randomly changing systems where it is assumed that (that is, it assumes the ). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable.”- Markov model future states depend only on the current state not on the events that occurred before it Markov property Wikipedia Awesome! Sounds interesting…but what does that huge blob even mean? I bolded the portion of a Markov Model is. In summary, a Markov Model is a model where the is solely chosen based on the . critical what next state current state One way to think about it is you have a window that only shows the current state (or in our case a single token) and then you have to determine what the next token is based on that small window! Demonstrating the next word using a window Above, I showed how each token leads to another token. Additionally, I colored the arrow leading to the next word based on the origin key. I recommend you spend some time on this diagram and the following ones because they build the foundation of Markov Models work! 🤓 how Pairs! Every Token leads to another Token! You may have noticed that every token leads to another one (even the *END*, leads to another token — ). In this case it forms pairs of one token to another token! none Organized pairs by starting word Above, I simply organized the pairs by their first token. At this point you may be recognizing something interesting 🤔 Each starting token is followed by a possible key to follow it… only Possible tokens to follow each key Ok, so hopefully you have followed along and understood that we are organizing pairs which we formed by using a to look at what the next token is in a pair. Then above I trimmed the pairs down even further into something very interesting. Every is matched with an array of possible tokens that could follow that key. “window” key ✨💡✨ ✨💡✨ Thinking Break Let’s take a moment to think about the above diagram. Every has possible words that follow it. If we were to give this structure from above to someone they could potentially recreate our original sentence! key could Example: We give them * to begin with, then we look at the potential options of words that follow *START* → [One]. Being that there is only key that follows we have to pick it. Our sentence now looks like “One.” Let’s continue by looking at the potential words that follow “One” → [fish]. Well again, that was easy only “fish” can follow One. Now our sentence is “One fish.” Now let’s see what follow “fish” → [two, red, blue, *END*]. Here is where things get interesting any of these four options be picked next 😳. Which means we pick “two” and then continue and potentially get our original sentence…but there is a 25% (1/4) chance we just randomly pick “*END*”. If this was the case we would have used our original structure and randomly generated a sentence very different than our original → “One fish.” 1️⃣ 🐠 Start* could could could could could ✨💡✨ ✨💡✨ End Congrats! 🎉🎉 You secretly just acted out a Markov Model in the above 😏. Nice! But seriously…think about it. We used the (current key) to determine our . Further our next state could only be a key that follows the current key. Sounds cool, but it gets even cooler! Let’s diagram a Markov Model for our starter sentence. Thinking Break current state next state Markov Model for the Starter Sentence Yikes 🙃 How does the above diagram represent what we just did? Look closely, each with a word inside it represents a with the pointing to potential that can follow it! But wait it gets even cooler: oval key arrows keys Markov Model for the Starter Sentence with Probability Yep! Each arrow has a probability that it will be selected to be the path that the current state will follow to the next state. Full Gif of the Starter Sentence being Created Using A Markov Model. Awesome! In summary, we now understand and have illustrated a Markov Model by using the Dr. Seuss starter sentence. As a , the data you use to create your model is often referred to as a fun fact corpus 👻 | You made it! Congrats again 🏅 at this point you likely can describe what a Markov Model is and even possibly teach someone else how they work using this same basic example! You my friend are going places 🚀. But guess what! This was just the beginning of your fuller understanding of Markov Models in the following sections we will continue to grow and expand your understanding :) Remember distributions? Well we are going to use them in the next example to show how to use to create a more accurate model; Further, we will talk about 😲 (bigger is better, right? 🙄); and lastly we will implement a nifty Markov Model in Python 🐍. So buckle up and enjoy the ride 🔥🎢 5. Full Example Summary weighted distributions potentially bigger windows Further Markov Model Topics 😼 🦊 I am going to be following the same process as above for creating the Markov Model, but I am going to omit some steps. If something appears confusing refer back to the first section 🔥 ** ** Disclaimer | Keeping in the spirit of Dr. Seuss quotes I went ahead and found four quotes that Theodor Seuss Geisel has 1. Larger Example immortalized: “Today you are you. That is truer than true. There is no one alive who is you-er than .” you “You have brains in your head. You have feet in your shoes. You can steer yourself any direction you choose. You’re on your own.” “The more that you read, the more things you will know. The more that you learn, the more places you’ll go.” “Think left and think right and think low and think high. Oh, the thinks you can think up if only you try.” -Dr. Seuss 🎩 The biggest difference between the original starter sentence and our new sentence is the fact that some follow different keys a of times. For example “more” follows “the” four times. So what will this additional complexity do to our Markov Model construction? 🤔 Well overall it our logical outcome for our sentences. What I mean by that is: There are certain words in the english language (or any language for that matter 🇷🇴) that come up wayyyy more often than others. For example the word “a” comes up significantly more in day to day conversation than “wizard” 🎩. Just how the world works 🌎 With that in mind, knowing how often in comparison one key shows up vs a different key is to seeming more realistic 😊 This is known as taking the into account when deciding what the next should be in the Markov Model. keys variable amount can improve critical weighted distribution step One way to programmatically 💻 represent this would be for each that follows a you store the and the amount of of that key! This can be done via having a dictionary and the dictionary would represent the and then have the of that be another dictionary that store the that follow as and their would be the **amount of occurrences…**Does this remind you of something we already talked about 📊? ! Exactly! The is severing as a histogram - it is soley keeping track of and their ! Wow, ok so many keys 🔑 were brought up and dictionaries too if you are curious about the code you should certainly check it out below👇 But otherwise, just recognize that in order to create a more advanced model we need to track what proceed other and the of these keys. key window keys occurrences key current window value dictionary key unique tokens keys values Histograms inner dictionary keys occurrences keys keys amount of occurrences | Awesome, quick tangent and then we will start tearing into this example 😋🍴 Cool so even this data set is small to be a good corpus! Why? Well, we will get different distribution of words which is great and will impact the entire structure, but in the larger scope of generating generated sentences you should aim to have at tokens. It would be better if you would have at least tokens. Then if you want to have a truly spectacular model you should aim for + tokens 🚀 2. Distribution very natural unique minimum 20,000 100,000, 500,000 But lets chat about how the distribution of words are in a with this larger example. one key window Colored distribution of keys Wow! Very cool 😎 Look at all that data - I went ahead and cleaned the data up and now you can see that each unique in our corpus has an array of all of the and that follow the unique key. Very nice! key keys occurrences Want to know a little secret? 😏 There is very little difference between this and the previous Markov model because in both situations we make decisions on the next step but storing the distribution of words allows us to weight the next step. Lets look at a real example from our data: solely based on the current status more : [things : 1, places : 1, that : 2] Awesome! 👏 So if the Markov Model’s current status was “more” than we would randomly select one of the following words: “things”, “places”, and “that”. However, “that” appears twice as opposed to “things” and “places” which occur once. Therefore, there is a 50% chance “that” would be selected and a 25% that either “things” or “places” is selected! 😃 think : [high : 1, up : 1, right : 1, low : 1, left : 1] ☝️☝️☝️ Awesome, similar example as above, but in this case “high”, “up”, “right”, “low”, and “left” all have a 20% chance of being selected as the next state if “think” is the current state! Make sense? 💭 | Currently, we have only been looking at markov models with of size one. We could increase the size of the window to get more . By more accurate I mean there will be less randomness in the generated sentences by the model because they will be closer and closer to the original corpus sentences. This can be good or bad 😇 😈 This is because if your purpose of the Markov Model is to generate some truly unqiue random sentences it would need to be a smaller window. A larger window is only a good idea if you have a large corpus 3. Bigger Windows windows “accurate” sentences significantly 100,000+ tokens. Increasing the size of the window is known as bringing the Markov Model to a . The current examples we have worked with have been If we use a second order Markov Model our window size would be two! Similarly, for a third order → window size of three. “higher order” first order markov models. The window is the data in the current state of the Markov Model and is what is used for decision making. If there is a bigger window in a smaller data set it is unlikely that there will be large unique distributions for the possible outcomes from one window therefore it could only recreate the same sentences. Let’s look at our original example with a - window of size two! 2️⃣ second order Markov Model 2nd order Markov Model Data Very interesting! Any observations? 🕵️ You may have noticed that every unique window of size two only has …therefore no matter where we start we will always get the same sentence because there is no possibility of deviating off the original path. There is a 100% chance we generate the same sentence 👎 Not great. This reveals a potential issue you can face with Markov Models…if you do not have a large enough corpus you will likely only generate sentences within the corpus which is not generating anything unique. Get a and then with using of the Markov Model 👍 one possible outcome huge data set - 500,000+ tokens play around different orders Implementation — Python 🐍 | The Dictogram purpose of the Dictogram is to act as a histogram but have incredibly fast 💨 and constant look up times regardless how large our data set gets. Basically it is a histogram built using a dictionary because dictionaries has the unique property of having constant lookup time O(1)! 1. Dictogram Data Structure The dictogram class can be created with an iterable data set, such as a list of words or entire books. I keep track of and count as I create it just so I can access those values without having to go through the entire data set 🤓 token key It is also good to note that I made two functions to return a random word. One just and the other function takes into account the amount of occurrences for each word and then returns a picks a random key weighted random word! 🏋️♀️ | Wow! It has been quite a journey to go from is a Markov Model to now be talking about implement a Markov Model 🌄 2. Markov Model Structure what how to In my implementation I have a that stores as the and then the for each is a . Basically I store a histogram of words for each window so I know what the next state can be based on a current state 😌 We increment the data in the dictogram for a key if it already exists in the current window! dictionary windows key in the key-value pair value key dictogram **3. Nth Order Markov Model Structure |**Some of you are definitely curious about how to so I also included how I went about doing that 😏 implement higher order Markov Models ☝️☝️☝️☝️☝️ Very similar to the first order Markov Model, but in this case we store a as the in the in the . We do this because a is a great way to represent a single list. And we use a tuple instead of a list because a key in a dictionary should not change and tuples are immutable sooo 🤷♂️ tuple key key-value pair dictionary tuple |Yay!! 🎉👏 🎉 We now have implemented a dictogram, but how do we now do the thing where we generate content based on and to a new state? 👇👇👇Here we will walk through our model 🚶 4. Parse Markov Model current status step Great, so I wanted to be able to only use valid starting sentence words so I checked anything in the key dictogram 🐶. Otherwise, you start the generated data with a (which I generate from valid starts), then you just keep looking at the (by going into the dictogram for that key) that could and based on and ( ). We keep repeating this until we do it times! 💯 personally END starting state possible keys follow the current state make a decision probability randomness weighted probability length Further Readings, Suggestions, Thoughts 💚 | Some classic examples of Markov models include peoples actions based on weather, the stock market, and tweet generators! 🐣 Think about how you could use a to create and based on a . Think about what would change? Hint: Not too much, if you have a solid understanding of what, why, and how Markov Models work and can be created the will be and if you add any 1. Applications corpus generate new content Markov Model only difference how you parse the Markov Model unique restrictions. For example, in my dope I used a larger window, limited all my generated content to be less than 140 character, there could be a variable amount of sentences, and I used only existing sentence starting windows to “seed” the sentences. 🌱 silicon valley tweet generator | Now that you have a good understanding of what a Markov Model is maybe you could explore how a Hidden Markov Model works. Or maybe if you are more inclined to build something using your new found knowledge you could read my artcile on building a HBO Silicon Valley Tweet Generator using a markov model (coming soon) ! 2. Further Reading | I am always looking for feedback so please feel free to share your thoughts on how the article was structured, the content, examples, or anything else you want to share with me 😊 Markov Models are great tools and I encourage you to build something using one…maybe even your own tweet generator 😜 Cheers! 🤗 3. Final Thoughts If you liked this article, click the💚 below so other people will see it here on Medium. Many thanks to and for providing edits, insights, and encouraging me to write this article 💎 Kenny Batista Eric Wong