Safety-Critical Reinforcement Learning with Viability-Based Action Shielding for Hypersonic Longitudinal Flight
arXiv:2602.03968v1 Announce Type: new
Abstract: This paper presents a safety-critical reinforcement learning framework for nonlinear dynamical systems with continuous state and input spaces operating under explicit physical constraints. Hard safety constraints are enforced independently of the reward through action shielding and reachability-based admissible action sets, ensuring that unsafe behaviors are never intentionally selected during learning or execution. To capture nominal operation and recovery behavior within a single control architecture, the state space is partitioned into safe and unsafe regions based on membership in a safety box, and a mode-dependent reward is used to promote accurate tracking inside the safe region and recovery toward it when operating outside. To enable online tabular learning on continuous dynamics, a finite-state abstraction is constructed via state aggregation, and action selection and value updates are consistently restricted to admissible actions. The framework is demonstrated on a longitudinal point-mass hypersonic vehicle model with aerodynamic and propulsion couplings, using angle of attack and throttle as control inputs.