What if RL agents were ranked by collapse resistance, not just reward?
I’ve been experimenting with a small RL evaluation scaffold I call ARCUS-H (Adaptive Robustness & Collapse Under Stress).
The idea is simple:
Most RL benchmarks evaluate agents only on reward in stationary environments.
ARCUS evaluates agents under structured stress schedules:
- pre → shock → post
- trust violation (action corruption)
- resource constraint
- valence inversion (reward flip)
- concept drift
For each episode, we track:
- reward
- identity trajectory (coherence / integrity / meaning proxy components)
- collapse score
- collapse event rate during shock
Then we rank algorithms by a robustness score:
0.55 * identity_mean + 0.30 * (1 - collapse_rate_shock) + 0.15 * normalized_reward
I ran PPO, A2C, DQN, TRPO, SAC, TD3, DDPG
Across:
- CartPole-v1
- Acrobot-v1
- MountainCar-v0
- MountainCarContinuous-v0
- Pendulum-v1 Seeds 0–9.
Interesting observations:
• Some high-reward agents collapse heavily under trust_violation
• Continuous-control algorithms behave differently under action corruption
• Identity trajectories reveal instability that reward alone hides
• Shock-phase collapse rates differentiate algorithms more than baseline reward
Processing img yzbg6zh63ckg1…
This raises a question:
Should RL benchmarks incorporate structured stress testing the way we do in control theory or safety engineering?
Would love feedback:
- Is this redundant with existing robustness benchmarks?
- Are the stress models realistic enough?
- What failure modes am I missing?
submitted by /u/Less_Conclusion9066
[link] [comments]