What if RL agents were ranked by collapse resistance, not just reward?

I’ve been experimenting with a small RL evaluation scaffold I call ARCUS-H (Adaptive Robustness & Collapse Under Stress).

The idea is simple:

Most RL benchmarks evaluate agents only on reward in stationary environments.

ARCUS evaluates agents under structured stress schedules:

  • pre → shock → post
  • trust violation (action corruption)
  • resource constraint
  • valence inversion (reward flip)
  • concept drift

For each episode, we track:

  • reward
  • identity trajectory (coherence / integrity / meaning proxy components)
  • collapse score
  • collapse event rate during shock

Then we rank algorithms by a robustness score:

0.55 * identity_mean + 0.30 * (1 - collapse_rate_shock) + 0.15 * normalized_reward 

I ran PPO, A2C, DQN, TRPO, SAC, TD3, DDPG
Across:

  • CartPole-v1
  • Acrobot-v1
  • MountainCar-v0
  • MountainCarContinuous-v0
  • Pendulum-v1 Seeds 0–9.

Interesting observations:

• Some high-reward agents collapse heavily under trust_violation
• Continuous-control algorithms behave differently under action corruption
• Identity trajectories reveal instability that reward alone hides
• Shock-phase collapse rates differentiate algorithms more than baseline reward

Processing img yzbg6zh63ckg1…

This raises a question:

Should RL benchmarks incorporate structured stress testing the way we do in control theory or safety engineering?

Would love feedback:

  • Is this redundant with existing robustness benchmarks?
  • Are the stress models realistic enough?
  • What failure modes am I missing?

submitted by /u/Less_Conclusion9066
[link] [comments]

Liked Liked