RL people: what’s the dumbest / longest bug you’ve ever had in a training run?

I’m new to RL and genuinely can’t tell what’s “normal” anymore.

What’s the longest you’ve spent debugging a training run before finding the real issue? What was the bug in the end?

Could be anything:

  • reward scaling
  • bad env logic
  • normalization issues
  • action masking
  • replay buffer bugs
  • training silently diverging
  • etc.

I keep losing absurd amounts of time to tiny mistakes and I’m trying to figure out whether that’s just part of RL.

submitted by /u/Illustrious_Song425
[link] [comments]

Liked Liked