Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success
arXiv:2601.18175v1 Announce Type: cross Abstract: A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names — rejection sampling with SFT, goal-conditioned RL, Decision Transformers — yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing […]