Towards Reasonable Concept Bottleneck Models

arXiv:2506.05014v2 Announce Type: replace-cross
Abstract: We propose a novel, flexible, and efficient framework for designing Concept Bottleneck Models (CBMs) that enables practitioners to explicitly encode and extend their prior knowledge and beliefs about the concept-concept ($C-C$) and concept-task ($C to Y$) relationships within the model’s reasoning when making predictions. The resulting $textbf{C}$oncept $textbf{REA}$soning $textbf{M}$odels (CREAMs) architecturally encode arbitrary types of $C-C$ relationships such as mutual exclusivity, hierarchical associations, and/or correlations, as well as potentially sparse $C to Y$ relationships. Moreover, CREAM can optionally incorporate a regularized side-channel to complement the potentially {incomplete concept sets}, achieving competitive task performance while encouraging predictions to be concept-grounded. To evaluate CBMs in such settings, we introduce a $C to Y$ agnostic metric that quantifies interpretability when predictions partially rely on the side-channel. In our experiments, we show that, without additional computational overhead, CREAM models support efficient interventions, can avoid concept leakage, and achieve black-box-level performance under missing concepts. We further analyze how an optional side-channel affects interpretability and intervenability. Importantly, the side-channel enables CBMs to remain effective even in scenarios where only a limited number of concepts are available.

Liked Liked