A Consequentialist Critique of Binary Classification Evaluation: Theory, Practice, and Tools
arXiv:2504.04528v3 Announce Type: replace-cross Abstract: Machine learning-supported decisions, such as ordering diagnostic tests or determining preventive custody, often require converting probabilistic forecasts into binary classifications. We adopt a consequentialist perspective from decision theory to argue that evaluation methods should prioritize forecast quality across thresholds and base rates. This motivates the use of proper scoring rules such as the Brier score and log loss. However, our empirical review of practices at major ML venues (ICML, FAccT, CHIL) reveals a […]