What if RL agents were ranked by collapse resistance, not just reward?

digitado ⋅ 19 de February de 2026

I’ve been experimenting with a small RL evaluation scaffold I call ARCUS-H (Adaptive Robustness & Collapse Under Stress).

The idea is simple:

Most RL benchmarks evaluate agents only on reward in stationary environments.

ARCUS evaluates agents under structured stress schedules:

pre → shock → post
trust violation (action corruption)
resource constraint
valence inversion (reward flip)
concept drift

For each episode, we track:

reward
identity trajectory (coherence / integrity / meaning proxy components)
collapse score
collapse event rate during shock

Then we rank algorithms by a robustness score:

0.55 * identity_mean + 0.30 * (1 - collapse_rate_shock) + 0.15 * normalized_reward

I ran PPO, A2C, DQN, TRPO, SAC, TD3, DDPG
Across:

CartPole-v1
Acrobot-v1
MountainCar-v0
MountainCarContinuous-v0
Pendulum-v1 Seeds 0–9.

Interesting observations:

• Some high-reward agents collapse heavily under trust_violation
• Continuous-control algorithms behave differently under action corruption
• Identity trajectories reveal instability that reward alone hides
• Shock-phase collapse rates differentiate algorithms more than baseline reward

Processing img yzbg6zh63ckg1…

This raises a question:

Should RL benchmarks incorporate structured stress testing the way we do in control theory or safety engineering?

Would love feedback:

Is this redundant with existing robustness benchmarks?
Are the stress models realistic enough?
What failure modes am I missing?

submitted by /u/Less_Conclusion9066
[link] [comments]

Like 0

Liked Liked