

We also need to consider the remaining variables as part of our decision tree, Now how do we select the remaining variables.Īs we know we have three variables left temperature, humanity and wind, we need to calculate Information gain. Now our tree looks like below.As Outlook had three categorical variable(Sunny, Overcast, Rainy) If we see the above screenshot Outlook has more Information Gain, so we are considering it as root node. Similarly calculate for temperature, humidity and wind and results as shown below. Now we got the Gain value for Outlook is 0.24 Gain = G(Outlook) = Entropy – Information = E(S)-I(Outlook) = 0.94-0.693 = 0.24 (We calculated E(S) in step1) Lets calculate Information gain for outlook, we need to group the data based on the categorical values in our case it is sunny,overcast and rainy as shown in below Image.īelow is the formula to calculate the Weight of Evidence or Information or Average Entropy. We need to calculate information and gain for each and every variable, which ever variable is giving high value we will consider it as root node. Information gain can be defined in many ways in simple, measures the impurity of the individual variables. In our case we have two class/binary values, if we have more then two class we also need to calculate for them, like what we did for p1(Yes) and p2(No) Step 2) Calculate Information/average entropy and gain: More Entropy = More Information Missing = Less Certainty = Less Purity Less Entropy = Less Information Missing = Greater Certainty = Greater Purity In our data set we have 9 YES and 5 NO out of 14 observations.įrom above equation we got entropy value as E(S)= 0.94 Let’s say in our dataset if play variable contains equal number of YES and NO values then entropy will be one which is also called impureįirst we need to calculate Entropy for our dependent/target/predicted variable. If half of the predicted/dependent variables are zero and remaining half is one then entropy will be 1.Let’s say in our dataset if play variable contains only one value either YES or NO then entropy will be zero which is also called Pure Let’s say in my dataset if all predicted/dependent variables are either zero or one then the value of entropy will be zero.We need to consider the variable with more information gain value as a root node. We need to calculate this for each variable, in our case its outlook, temperature, humidity, ay will the variable which we need to predict. Here is where we use Entropy and Information gain. Now we will see how we achieved above decision tree using Entropy and Information gain matrices.įirst and foremost question is, how do I chose my root node as outlook ? Once we build a decision tree it looks like below. We are taking below famous data which is widely used data set for explaining Decision tree algorithm.
