RL people: what’s the dumbest / longest bug you’ve ever had in a training run?
I’m new to RL and genuinely can’t tell what’s “normal” anymore.
What’s the longest you’ve spent debugging a training run before finding the real issue? What was the bug in the end?
Could be anything:
- reward scaling
- bad env logic
- normalization issues
- action masking
- replay buffer bugs
- training silently diverging
- etc.
I keep losing absurd amounts of time to tiny mistakes and I’m trying to figure out whether that’s just part of RL.
submitted by /u/Illustrious_Song425
[link] [comments]
Like
0
Liked
Liked