Solving a Problem with Bayes’ Theorem and Decision Tree

Learn how to solve a playing chess problem with Bayes’ Theorem and Decision Tree in this article by Dávid Natingga, a data scientist with a master’s in engineering in 2014 from Imperial College London, specializing in artificial intelligence. Playing chess — independent events Suppose you are given the following table of data. This tells you whether or not your friend will play a game of chess with you outside in the park, based on a number of weather-related conditions: Now, establish using Bayes’ theorem, whether your friend would like to play a game of chess with you in the park given that the Temperature is Warm, the Wind is Strong, and it is Sunny. Analysis In this case, you may want to consider , , and as the independent random variables. The formula for the extended Bayes’ theorem, when adopted, becomes the following: Temperature Wind Sunshine Now, count the number of columns in the table with all known values to determine the individual probabilities. , since there are 10 columns with complete data, and of them have the value for the attribute . P(Play=Yes)=6/10=3/5 6 Yes Play , since there are columns with the value for the attribute and, of those, have the value for the attribute . Similarly, you’ll have the following: P(Temperature=Warm|Play=Yes)=3/6=1/2 6 Yes Play 3 Warm Temperature Thus: and Therefore, you’ll have the following: This means that your friend is likely to be happy to play chess with you in the park in the stated weather conditions, with a probability of about . Since this is a majority, you could classify the data vector ( ) as being in the class. 67% Temperature=Warm, Wind=Strong, Sunshine=Sunny Play=Yes Playing chess — dependent events Now, suppose that you would like to find out whether your friend would like to play chess with you in a park in Cambridge, UK. But, this time, you are given different input data: Now you may be wondering how the answer to whether your friend would like to play in a park in Cambridge, UK, will change with this different data in regard to the being , the being , and the being . Temperature Warm Wind Strong Season Spring Analysis You may be tempted to use Bayesian probability to calculate the probability of your friend playing chess with you in the park. However, you should be careful, and ask whether the probability of the events is independent of each other. In the previous example, where you used Bayesian probability, you were given the probability variables , , and . These are reasonably independent. Common sense tells you that a specific and do not have a strong correlation to a specific speed. It is true that sunny weather results in higher temperatures, but sunny weather is common even when the temperatures are very low. Hence, you considered even and as being reasonably independent as random variables and applied Bayes’ theorem. Temperature Wind Sunshine Temperature Sunshine Wind Sunshine Temperature However, in this example, and are closely related, especially in a location such as the UK, your stated location for the park. Unlike countries closer to the equator, temperatures in the UK vary greatly throughout the year. Winters are cold and summers are hot. Spring and fall have temperatures in between. Temperature Season Therefore, you cannot apply Bayes’ theorem here, as the random variables are dependent. However, you could still perform some analysis using Bayes’ theorem on the partial data. By eliminating sufficient dependent variables, the remaining ones could turn out to be independent. Since is a more specific variable than , and the two variables are dependent, keep only the variable. The remaining two variables, and , are independent. Temperature Season Temperature Temperature Wind Thus, you get the following data: You can keep the duplicate rows, as they give you greater evidence of the occurrence of that specific data row. Input: Saving the table, you get the following CSV file: # source_code/2/chess_reduced.csv Temperature,Wind,Play Cold,Strong,No Warm,Strong,No Warm,None,Yes Hot,None,No Hot,Breeze,Yes Warm,Breeze,Yes Cold,Breeze,No Cold,None,Yes Hot,Strong,Yes Warm,None,Yes Warm,Strong,? Output: Input the saved CSV file into the program and you’ll get the following result: naive_bayes.py python naive_bayes.py chess_reduced.csv [['Warm', 'Strong', {'Yes': 0.49999999999999994, 'No': 0.5}]] The first class, , is going to be true, with a probability of 50%. The numerical difference resulted from using Python's non-exact arithmetic on the float numerical data type. The second class, , has the same probability, that is, 50%, of being true. Thus, you cannot make a reasonable conclusion with the data that you have about the class of the vector ( , ). However, you have probably already noticed that this vector already occurs in the table with the resulting class . Hence, your guess would be that this vector should just happen to exist in one class, . But, to have greater statistical confidence, you would need more data or more independent variables to be involved. Yes No Warm Strong No No Playing chess — analysis with a decision tree Now, find out whether your friend would like to play chess with you in the park. But this time, use decision trees to find the answer: Analysis You have the initial set, , of the data samples, as follows: S S={(Cold,Strong,Cloudy,No),(Warm,Strong,Cloudy,No),(Warm,None,Sunny,Yes), (Hot,None,Sunny,No),(Hot,Breeze,Cloudy,Yes),(Warm,Breeze,Sunny,Yes),(Cold,Breeze,Cloudy,No),(Cold,None,Sunny,Yes),(Hot,Strong,Cloudy,Yes),(Warm,None,Cloudy,Yes)} First, determine the information gain for each of the three non-classifying attributes: , , and . The possible values for are , , and . Therefore, you’ll partition the set, , into three sets: temperature wind sunshine temperature Cold Warm Hot S Scold={(Cold,Strong,Cloudy,No),(Cold,Breeze,Cloudy,No),(Cold,None,Sunny,Yes)} Swarm={(Warm,Strong,Cloudy,No),(Warm,None,Sunny,Yes),(Warm,Breeze,Sunny,Yes),(Warm,None,Cloudy,Yes)} Shot={(Hot,None,Sunny,No),(Hot,Breeze,Cloudy,Yes),(Hot,Strong,Cloudy,Yes)} Calculate the information entropies for the sets first: The possible values for the attribute are , , and . Thus, you’ll split the set, , into the three partitions: wind None Breeze Strong S The information entropies of the sets are as follows: Finally, the third attribute, , has two possible values, and . Hence, it splits the set, , into two sets: Sunshine Cloudy Sunny S The entropies of the sets are as follows: and are greater than . Both of them are equal; therefore, you can choose any of the attributes to form the three branches; for example, the first one, . In this case, each of the three branches would have the data samples , , and . At those branches, you could apply the algorithm further to form the rest of the decision tree. Instead, use the program to complete it: IG(S,wind) IG(S,temperature) IG(S,sunshine) Temperature Scold Swarm Shot Input: source_code/3/chess.csv Temperature,Wind,Sunshine,Play Cold,Strong,Cloudy,No Warm,Strong,Cloudy,No Warm,None,Sunny,Yes Hot,None,Sunny,No Hot,Breeze,Cloudy,Yes Warm,Breeze,Sunny,Yes Cold,Breeze,Cloudy,No Cold,None,Sunny,Yes Hot,Strong,Cloudy,Yes Warm,None,Cloudy,Yes Output: Classification Now that you have constructed the decision tree, use it to classify a data sample into one of the two classes in the set . (warm,strong,sunny,?) {no,yes} Start at the root. What value does the attribute have in that instance? , so go to the middle branch. What value does the attribute have in that instance? , so the instance would fall into the class since you have already arrived at the leaf node. temperature Warm wind Strong No So, your friend would not want to play chess with you in the park, according to the decision tree classification algorithm. Note that the Naive Bayes algorithm stated otherwise. An understanding of the problem is required to choose the best possible method. If you found this article interesting, you can explore Data Science Algorithms in a Week — Second Edition to build a strong foundation of machine learning algorithms in 7 days. Data Science Algorithms in a Week — Second Edition will help you understand how to choose machine learning algorithms for clustering, classification, and regression and know which is best suited for your problem.