Regret Is Weighted Forgetting

digitado ⋅ 19 de March de 2026

How much of an agent’s regret comes from a bad representation, and how much from a bad policy? This paper gives an exact answer. For a fixed representation M and a finite evaluation distribution over history-test pairs, the minimum average normalized regret over all M-based policies equals the minimum margin-weighted deletion cost needed to make the optimal bet single-valued on each representation-test cell (M(h),T). A policy-wise decomposition then splits any actual policy’s regret into irreducible aliasing cost plus avoidable within-cell misreporting. A Stack-Theoretic reformulation identifies the same quantity as a deficit in weighted weakness on a lifted task constructed from the evaluation support (where weakness is normally the degree to which a policy leaves open unseen diagnostic continuations). I use the identity to derive several direct corollaries, including a representation-convergence theorem in pure RL language, a regret-based partial order on abstractions, Lipschitz stability of K_ρ under margin estimation error, and connections to free energy and multi-agent coordination. A cross-framework corollary converts the regret floor into a generalisation probability. Under the canonical independent prior, the optimal M-based policy generalises with probability exp(-K_ρ(M)). The multi-class generalisation to K>2 diagnostic outcomes is proved. Controlled POMDP experiments confirm the decomposition is numerically exact and that K_ρ discriminates between representations where accuracy and raw impurity do not. The weakness-maximisation theorems predict optimal generalisation through least commitment, but their formal object (the extension of a policy in an embodied language) does not have a direct analogue in neural network function approximation. Bridging that gap is identified as an open problem.

Like 0

Liked Liked