Risk-Gated Hierarchical Option Policies for Budgeted Web Navigation with Irreversible-Action Failure
Long-horizon web navigation requires both strategic planning and local UI manipulation, where failures are often triggered by a small subset of irreversible actions (submit, delete, purchase). We propose a hierarchical framestudy with options: a high-level manager selects subgoals (search, filter, compare, checkout), while low-level option policies execute UI actions. To control failures under multi-cost budgets, we introduce a Risk-Gated Option Critic (RGOC) model where each option is equipped with a learned hazard predictor estimating the probability of entering a failure-absorbing set within kkk steps. The manager performs budgeted subgoal selection using a constrained Bellman backup with multi-cost penalties (requests/latency/fees) and a risk gate that suppresses option activation when hazard exceeds a learned threshold tied to the remaining budget. We recommend training on 800–1,500 multi-step tasks across 30–60 sites, and measuring (i) irreversible-action failure rate, (ii) average steps-to-success, (iii) budget adherence, and (iv) risk calibration (ECE of hazard predictor). RGOC is designed to outperform flat policies by reducing compounding errors and localizing safety constraints at the option boundary, improving robustness under tight budgets.