Overfitting happens when the learning algorithm continues to develop hypothesis that reduce training set error at the cost of an increased test set error.
Approach to avoid Overfitting
Pre-pruning that stops growing the tree earlier, before it perfectly classifies the training set.
Post-pruning allows the tree to perfectly classify the training set, and then prune the tree.
Incorporating Continuous-valued attributes
The attributes which have continuous values can’t have a proper class prediction. For example, AGE or Temperature can have any value, and there is no solution for it until a range is defined in the decision tree itself.
Attributes with many values
If attributes have a lot values, then the Gain could select any value for further processing. This reduces the accuracy for classification.
Heading attributes with costs
The complexity of Gain calculation increases if varying cost is associated with every same entry of a tuple to be classified. The solution to this is replacing the Gain calculation.
Handling examples with missing attribute values
It is possible to have missing value in the training set. To avoid this, most common value among the examples can be selected for the tuple in consideration.
Unable to determine depth of decision tree
If the training set does not have an end value i.e. the set is given to be continuous, this can lead to an infinite decision tree building.