Answer :
The data set is missing in the question. The data set is given in the attachment.
Solution :
a). In the table, there are four positive examples and give number of negative examples.
Therefore,
[tex]$P(+) = \frac{4}{9}$[/tex] and
[tex]$P(-) = \frac{5}{9}$[/tex]
The entropy of the training examples is given by :
[tex]$ -\frac{4}{9}\log_2\left(\frac{4}{9}\right)-\frac{5}{9}\log_2\left(\frac{5}{9}\right)$[/tex]
= 0.9911
b). For the attribute all the associating increments and the probability are :
[tex]$a_1$[/tex] + -
T 3 1
F 1 4
Th entropy for [tex]$a_1$[/tex] is given by :
[tex]$\frac{4}{9}[ -\frac{3}{4}\log\left(\frac{3}{4}\right)-\frac{1}{4}\log\left(\frac{1}{4}\right)]+\frac{5}{9}[ -\frac{1}{5}\log\left(\frac{1}{5}\right)-\frac{4}{5}\log\left(\frac{4}{5}\right)]$[/tex]
= 0.7616
Therefore, the information gain for [tex]$a_1$[/tex] is
0.9911 - 0.7616 = 0.2294
Similarly for the attribute [tex]$a_2$[/tex] the associating counts and the probabilities are :
[tex]$a_2$[/tex] + -
T 2 3
F 2 2
Th entropy for [tex]$a_2$[/tex] is given by :
[tex]$\frac{5}{9}[ -\frac{2}{5}\log\left(\frac{2}{5}\right)-\frac{3}{5}\log\left(\frac{3}{5}\right)]+\frac{4}{9}[ -\frac{2}{4}\log\left(\frac{2}{4}\right)-\frac{2}{4}\log\left(\frac{2}{4}\right)]$[/tex]
= 0.9839
Therefore, the information gain for [tex]$a_2$[/tex] is
0.9911 - 0.9839 = 0.0072
[tex]$a_3$[/tex] Class label split point entropy Info gain
1.0 + 2.0 0.8484 0.1427
3.0 - 3.5 0.9885 0.0026
4.0 + 4.5 0.9183 0.0728
5.0 -
5.0 - 5.5 0.9839 0.0072
6.0 + 6.5 0.9728 0.0183
7.0 +
7.0 - 7.5 0.8889 0.1022
The best split for [tex]$a_3$[/tex] observed at split point which is equal to 2.
c). From the table mention in part (b) of the information gain, we can say that [tex]$a_1$[/tex] produces the best split.
