Overfitting happens when the learning algorithm continues to develop hypothesis that reduce training set error at the cost of an increased test set error.

Approach to avoid Overfitting

Pre-pruning that stops growing the tree earlier, before it perfectly classifies the training set.

Post-pruning allows the tree to perfectly classify the training set, and then prune the tree.

Incorporating Continuous-valued attributes

The attributes which have continuous values can’t have a proper class prediction. For example, AGE or Temperature can have any value, and there is no solution for it until a range is defined in the decision tree itself.

Attributes with many values

If attributes have a lot values, then the Gain could select any value for further processing. This reduces the accuracy for classification.

Heading attributes with costs

The complexity of Gain calculation increases if varying cost is associated with every same entry of a tuple to be classified. The solution to this is replacing the Gain calculation.

Handling examples with missing attribute values

It is possible to have missing value in the training set. To avoid this, most common value among the examples can be selected for the tuple in consideration.

Unable to determine depth of decision tree

If the training set does not have an end value i.e. the set is given to be continuous, this can lead to an infinite decision tree building.

Leave a Reply