Analyzing decision tree bias towards the minority class

arXiv:2501.04903v4 Announce Type: replace
Abstract: There is a widespread and longstanding belief that machine learning models are biased towards the majority class when learning from imbalanced binary response data, leading them to neglect or ignore the minority class. Motivated by a recent simulation study that found that decision trees can be biased towards the minority class, our paper aims to reconcile the conflict between that study and other published works. First, we critically evaluate past literature on this problem, finding that failing to consider the conditional distribution of the outcome given the predictors has led to incorrect conclusions about the bias in decision trees. We then show that, under specific conditions, decision trees fit to purity are biased towards the minority class, debunking the belief that decision trees are always biased towards the majority class. This bias can be reduced by adjusting the tree-fitting process to include regularization methods like pruning and setting a maximum tree depth, and/or by using post-hoc calibration methods. Our findings have implications on the use of popular tree-based models, such as random forests. Although random forests are often composed of decision trees fit to purity, our work adds to recent literature indicating that this may not be the best approach.

Liked Liked