Removing PER from Rainbow DQN improved performance on Snake. New record of 153 on 20×20 grid.
Greetings all! I’m Running a systematic Rainbow DQN ablation on Snake (20×20 grid), adding one component at a time. The most surprising result so far: removing Prioritised Experience Replay (PER) from full Rainbow didn’t just match performance, it set a new record.
Full Rainbow (with PER): record 134 C51 without PER (everything else identical): record 153
Controlled eval at ep50K (20,000 episodes, deterministic, same seeds): C51 without PER outperformed full Rainbow across every percentile. avg +45%, p50 +35%, p90 +39%. Zero overlap between segment distributions.
Tested across 5 seeds. Individual seeds are noisy with occasional flips, but the mean across all 5 favours removing PER.
What I think is the reason: Snake is a dense-reward task. Food is frequent, TD errors are relatively uniform across the buffer, and 2048 parallel environments already ensure replay diversity. PER’s priority mechanism has nothing meaningful to prioritise. Meanwhile the IS weight correction still suppresses gradients. You pay the overhead without the benefit.
This is consistent with Hessel et al.’s original context. Their finding that PER was a top-2 Rainbow component was measured on Atari, which is sparse-reward with high TD error variance. Snake is roughly the opposite. Pan et al. and Ivgi et al. have independently documented similar PER underperformance on dense-reward tasks.
Previous best published peer-reviewed result on 20×20 Snake was 62 (Sebastianelli et al., 2021). The 153 is 2.5× that.
Has anyone else observed PER underperforming on dense-reward tasks? Curious whether this generalises beyond Snake. I’m planning to test on Tetris next.
submitted by /u/statphantom
[link] [comments]