SAC model collapse (?) after 950k steps

digitado ⋅ 29 de May de 2026

Tldr; my sac model experienced catastrophic failure after 950k steps. Entropy through the roof, mean reward and episode length down to almost 0. What the hell happened?? How do I stop it from happening again? Is the model recoverable?

I’ve been working on a bipedal robot with point feet, trying to get it to just pace on the spot. After weeks of models settling on useless policies, I discovered the constraints as terminators (CaT) framework from this paper. Their results looked promising, so I had a go at applying the principles to a SAC agent.

(For those uninterested in the specific details of my constraints implementation, skip to the next paragraph.)

I used a leaky integrator to model constraint violation density, where an episode would end after the violation density crossed a specific threshold. This was coupled with an asymmetric-actor-critic architecture, where the critic was fed the violation densities. The specific constraints I decided to try for my first iteration were:

– no self leg contact

– torso must be above a minimum height

– only one leg should touch the ground at a time, following a corresponding CPG. (this was borrowed from the above paper)

My previous model attempted to encourage stepping in place with only rewards rather than termination, which was the main obstacle I was encountering when trying to get a model to step forever.

The new model was training well. It had surpassed my previous model by a considerable margin, and it showed no signs of stopping, however, after 950k training steps, there was a complete model failure (I’m not sure if collapse is the right term here?). My entropy coefficient shot up from ~0.05 to over 100, and my rewards and episode lengths had gone down to almost zero. The actor loss had gone through the floor, and critic loss through the roof. I had a look at some episodes – before 950k the model was stepping relatively well, and approaching a decent policy, and after it fell over almost instantly. Worth noting that my previous best model had surpassed 1M steps, with no issues.

What the hell happened? Is the model recoverable, or is the replay buffer now full of garbage from the last 50k training steps (I stopped at 1M)? How do I prevent this happening again in the future?

submitted by /u/DirectPalpitation523
[link] [comments]

Like 0

Liked Liked