Desirability Rating Based Counterfactual (DeRaC) Framework for Multi-Dimensional Classification Problems

Counterfactual explanations are increasingly vital for understanding and trusting machine learning models. This study presents, Desirability Rating based Counterfactual (DeRaC), a generalized framework for generating valid counterfactual explanations applicable to multi-dimensional classification problems, including single and multi-output classification with binary and multi-label outputs. By expanding the definition of counterfactual validity through a novel “desirability rating,” the approach addresses limitations in existing methods for complex output spaces. This work details a novel framework, introducing concepts like partially valid counterfactuals and a quantitative measure of output desirability, which can be used with objective functions to find counterfactuals that also satisfy the various existing properties such as similarity, proximity, validity, actionability, etc. Experiments demonstrate the feasibility of systematically generating counterfactuals using existing optimization techniques, achieving varying degrees of validity and similarity. The research emphasizes the context-dependent nature of counterfactuals and lays the foundation for more transparent and trustworthy machine learning systems.

Liked Liked